# 5 LLM Cost Metrics Every CFO Should See on Monday Morning

> Cost Per Completion · Token Waste Ratio · Model-Task Mismatch · Agent Loop Depth · Cost Per Business Outcome

*Published 2025-02-17 · 10 min read · canonical: https://promptleash.com/blog/5-llm-cost-metrics-every-cfo-should-see*

_Practitioner perspective. Scenarios and figures are illustrative unless a source is linked._

Your CFO can tell you, to the cent, what the company spent on cloud infrastructure last quarter. They can break down headcount costs by department, software licences by vendor, and travel expenses by region. Ask them what the company spent on AI last month, and you will likely get a blank stare, or worse, a confident answer that is catastrophically wrong.

This is not their fault. The tooling does not exist in most organisations to make AI spend visible at the executive level. What finance teams typically see is a single line item on an AWS or Azure invoice labelled something like “Amazon Bedrock: $47,312.89.” That number tells you almost nothing. It does not tell you whether that money was well spent, which teams drove the cost, or whether you could achieve the same outcomes for half the price.

The companies that will dominate the next phase of AI adoption are the ones that treat LLM spend with the same rigour they apply to every other operational cost. That starts with surfacing the right metrics to the right people.

Here are the five numbers your CFO should be looking at every Monday morning.

## 1. Cost Per Successful Completion

**What it is:** The total LLM spend divided by the number of requests that actually achieved their intended purpose.

**Why it matters:** Raw API spend is a vanity metric. A $50,000 monthly bill means nothing without context. If your customer support agent resolved 200,000 tickets with that spend, you are paying $0.25 per resolution, almost certainly cheaper than a human agent. If it resolved 500 tickets because the other 199,500 failed, hallucinated, or required human escalation anyway, you are paying $100 per resolution and would have been better off hiring temps.

Cost Per Successful Completion forces your teams to define what “success” actually means for each AI use case, and then measure whether they are achieving it economically. It is the single most important metric for determining whether your AI investment is generating returns or burning runway.

**What good looks like:** This varies wildly by use case, but the trend matters more than the absolute number. If this metric is climbing week over week, something is degrading, prompt quality, model performance after an update, or increased complexity of incoming requests.

## 2. Token Waste Ratio

**What it is:** The percentage of tokens consumed that did not contribute to the final output delivered to the user or downstream system.

**Why it matters:** Retries, failed function calls, overstuffed context windows, and verbose system prompts can all create avoidable cost. In a RAG application, retrieved context should be tested for whether it contributes to the final output rather than assumed to be useful.

Token Waste Ratio makes this invisible tax visible. When your CFO sees that 72% of tokens are being wasted, they will ask the obvious question: “Can we get that number down?” And the answer, almost always, is yes.

**What good looks like:** Establish an internal baseline by use case, test whether context contributes to output quality, and investigate material deterioration or unexplained variance.

## 3. Model-Task Mismatch Rate

**What it is:** The percentage of API calls where the model used was more capable (and expensive) than the task required.

**Why it matters:** This is the metric that directly quantifies the “Model Laziness” problem. If your engineering team is routing every request to GPT-5 or Claude Opus regardless of complexity, your Mismatch Rate will be sky-high, and so will your bill.

A healthy organisation should define an acceptable mismatch rate based on task requirements and tested output quality. A material mismatch rate creates a clear investigation and optimisation opportunity.

## 4. Agent Loop Depth

**What it is:** The average number of LLM calls an autonomous agent makes before completing (or abandoning) a task.

**Why it matters:** Agentic AI is the fastest-growing cost vector in enterprise AI, and it is the hardest to predict. A well-designed agent might resolve a task in 3–5 LLM calls. A poorly designed one, or one encountering an edge case, might spiral into 50, 100, or 500 calls before hitting a timeout, each one burning tokens at frontier-model prices.

Even a modest increase in average loop depth from 5 to 8 calls per task represents a 60% cost increase that will never show up in a standard cloud bill. It will just look like “AI costs went up.”

**What good looks like:** Establish baselines per agent type and track the distribution, not just the average. A mean of 6 with a max of 200 tells a very different story than a mean of 6 with a max of 9. The outliers are where the money hides.

## 5. Cost Per Business Outcome

**What it is:** The total AI spend attributed to producing a specific, measurable business result, a closed support ticket, a processed invoice, a generated lead, a completed code review.

**Why it matters:** This is the metric that bridges the gap between engineering and the boardroom. Your CFO does not care about tokens. They do not care about model versions. They care about unit economics. If AI-powered invoice processing costs $0.12 per invoice compared to $4.50 for manual processing, that is a story the board understands.

| **Business Outcome** | **AI Cost** | **Manual Cost** | **Savings** |
| Support ticket resolved | $0.25 | $12.00 | **97.9%** |
| Invoice processed | $0.12 | $4.50 | **97.3%** |
| Code review completed | $2.30 | $35.00 | **93.4%** |
| Contract clause flagged | $0.85 | $18.00 | **95.3%** |

Without this metric, AI remains a mysterious line item that finance tolerates during good times and targets during cost cuts. With it, AI becomes a quantifiable investment with measurable returns, which is exactly what it should be.

## Putting It All Together: The Monday Morning Dashboard

None of these metrics are useful in isolation. The power comes from seeing them together, on a single screen, every Monday morning.

| **Metric** | **This Week** | **Last Week** | **Trend** |
| **Cost / Successful Completion** | $0.31 | $0.28 | **↑ 10.7%** |
| **Token Waste Ratio** | 58% | 62% | **↓ 4pts** |
| **Model-Task Mismatch** | 34% | 41% | **↓ 7pts** |
| **Avg Agent Loop Depth** | 6.2 | 5.8 | **↑ 6.9%** |
| **Cost / Business Outcome** | $0.47 | $0.52 | **↓ 9.6%** |

A dashboard like this tells a complete story in thirty seconds. The CFO can immediately see that while overall cost per business outcome is improving (good), the cost per completion is creeping up and agent loop depth is rising (investigate). Token waste is improving, probably because the team just optimised their RAG pipeline. Model mismatch is down, suggesting that routing improvements are taking effect.

This is the kind of operational intelligence that turns AI from a speculative expense into a managed investment.

## Your AI Bill Deserves the Same Scrutiny as Your Headcount

We have entered an era where AI spend will rival, and for some companies, exceed traditional cloud infrastructure costs. The organisations that thrive will not be the ones that spend the most on AI. They will be the ones that spend the most intelligently.

That starts with measurement. You cannot optimise what you cannot see, and many leaders still lack task-level evidence they can act on.

These five metrics give your finance team, your engineering leadership, and your board a shared language for discussing AI economics. They transform vague concerns about “AI costs going up” into specific, actionable insights that drive real optimisation.
