On June 1, 2026, every GitHub Copilot plan moved from premium request limits to token-metered AI Credits. On the surface, this looks like a pricing tweak. The same $10, $19 and $39 price tags as before. GitHub framed it as a new included credit allowance, pooled across your organisation, with admin controls to manage the rest.
But underneath that calm announcement is an admission that GitHub had been losing money on its heaviest users, and that is a common theme for AI companies. This is not really a story about Copilot. It is a preview of what is coming to every AI tool your enterprise has bought a flat-rate licence for.
Why Flat-Rate AI Pricing Was Always Temporary
Flat-rate subscriptions work when usage is roughly uniform across your user base. AI inference does not behave like that. Every interaction has a direct, variable cost in tokens, and the spread between a light user and a heavy user is not 2x or 3x. It is 50x, 100x, sometimes more.
GitHub's old Premium Request Unit system was the patch for this. Heavy users would get quietly downgraded to a fallback model once they hit their cap, a way of capping cost without admitting the subscription could not cover it. This patchwork solution was effective enough until it was broken again by the introduction of agents.
A chat question and an autonomous multi-hour coding session used to cost the user the same $10. One of those consumes a few hundred tokens. The other can consume hundreds of thousands by reading files, planning, writing code, running tests, iterating and opening a pull request. GitHub was absorbing that gap at scale, across millions of seats, as agentic features became the default way people used the product.
Why This Is an Industry Pattern
GitHub Copilot is the first major consumer-facing AI product to make this transition mandatory, but it is not an outlier. It is a leading indicator.
Look at where the rest of the market already sits. Azure OpenAI and every major foundation model API have always been usage-based. Microsoft 365 Copilot still sells flat per-seat licences, but it has already quietly introduced a pay-as-you-go tier for specific agent scenarios. That is the same precursor move GitHub made with Premium Request Units before this announcement.
The pattern is consistent: flat-rate pricing is the customer acquisition phase, and usage-based pricing is the maturity phase. With the core value of AI broadening rapidly, it only makes sense to re-evaluate the pricing too.
Copilot crossed that line first because coding agents matured faster than agents in almost any other category. Every other AI tool with an agentic roadmap is on the same trajectory, just earlier in the cycle.
What This Means for Enterprises
Every organisation already has a handful of developers who live in Copilot and run agents on every task, every commit, every refactor. Under the old model, they cost the same $10–$39 as everyone else and were getting great value. Under the new one, they need to use AI deliberately. As agentic features get better and developers lean on them more, that small group's consumption compounds.
At the same time, the old problem has not gone anywhere. It is just better hidden. Flat-rate licensing always had a quiet leak that hurt enterprises the most: seats assigned to people who barely use the tool are still billed at full price. The Copilot usage-based pricing model does not fix that. An inactive seat still costs the same, but the focus may shift towards the new problem of large individual users. Since the heavy users are still getting value, the main problem for enterprises remains the unused or underused seats.
So the spend story has two halves, both heading the wrong way: a small group of users whose cost will keep climbing as agentic usage deepens, and a larger group of inactive or near-inactive seats whose waste may continue to be overlooked. Without visibility into both halves separately, money allocated to the AI budget is having no impact.
How AI Intelligence Helps
GitHub's new admin controls are a genuine improvement on the old Premium Request Unit system, and worth using. But they only tell you that you have spent the money. They do not tell you why, who, or whether it was worth it. Solving the two-sided problem, escalating power users on one end and quietly wasted seats on the other, needs a three-part solution.
1. Usage at the level it actually varies.
A single org-wide number is not actionable. The moment you can see that one workflow consumes 80% of your credit pool, or that a given seat has not touched the tool in six weeks, you have somewhere to start. Without that breakdown, every cost conversation is a guess dressed up as a decision.
2. Consumption spikes before the bill arrives.
Provider billing dashboards typically lag actual usage by hours. An agentic loop burning through a month's allotment in an afternoon will not show up until the damage is done. Real-time visibility is the difference between catching a problem and explaining one.
3. Model choice as a visible decision, not a default.
Under flat-rate, nobody asked whether a task needed the frontier model or whether a cheaper one would do. There was no reason to. Under usage-based billing, that question now has a dollar value attached every time it is skipped.
PromptLeash was built to target the whole problem. We act as a layer between your teams and the AI providers they are using, giving you per-agent and per-workflow visibility into where credits are going, real-time alerts when consumption spikes outside normal patterns, and the data to make informed calls about which tasks justify which models, while also surfacing the seats that are quietly going unused.
GitHub's pooled credits and admin caps are tools for controlling the blast radius after the fact. PromptLeash is how you see both sides of the spend before they show up as a single number on an invoice.