Optimising Claude Code Costs Without Impacting Developer Velocity
How to control enterprise AI costs without slowing engineering teams
Engineering teams are adopting tools like Claude faster than organisations can govern them. The result? Rising AI costs, limited visibility, and growing pressure on leadership to control spend, without slowing delivery.
It is clear that Claude Code is a revolutionary technology. Everyone in tech is talking about it. Whether or not your InfoSec team has approved it, people are using it. It is here to stay.
Anthropic knows that. They have created a product which could be the catalyst for the next industrial revolution with their year-on-year growth standing at ~4,700% (especially remarkable when compared to Google’s ~20%).
Anthropic and their investors will continue to spend billions becoming the frontier firm in the AI space. Their recent funding round valuing them at $965B eclipses the market cap of Walmart ($947B), JPMorgan Chase ($802B), and is roughly three times the cap of the FTSE 100s most valuable company, HSBC (~$325B). These figures encapsulate how this is arguably the most capital-intensive technology race in history.
But, they need to deliver a profit. Pricing is changing. Anthropic is shifting towards usage-based token consumption billing. The days of tokenmaxxing on a Claude Max plan are over.
The challenge now facing CTO’s is the classic cost – performance – security paradigm.
The question is simple – how do you optimise the cost of Claude without impacting the delivery velocity it brings to your teams?
The New Cost Paradigm
Why AI token usage is driving unpredictable engineering costs
Optimising Claude without impacting velocity starts with recognition that cost is no longer fixed – it now scales with how your engineers use the tool. Going from 100 to 1,000 users will not scale costs linearly. As engineers become adept at building agentic workflows, with agent-to-agent interactions, cost can multiple exponentially, with radical week on week cost swings.
The challenge for engineering leaders is not to restrict usage, but to optimise it intelligently. Those who are succeed are introducing day-one visibility, rightsizing models, implementing usage guardrails and considering token costs within IT budgets.
Done correctly, these approaches help organisations reduce unnecessary AI spend, improve cost visibility, and maintain engineering throughput.
Below are eight practical ways to optimise Claude usage within your organisation.
- Structure & Govern Workspaces
Every Claude organisation includes a default workspace, but that should be treated as the root account. It is essentially for control, but engineers should not be using it for sessions or agents.
Start with the operating model, not the tooling. Structure your workspaces to mirror your engineering team structure, whether that’s by teams, products or departments. For example, separating “Platform Engineering”, “Product Teams”, and “AI Experiments” ensures cost, usage, and accountability can be mapped cleanly back to specific initiatives.
Limit workspace creation to admins, assign clear owners and enforce a consistent naming standard from day one. Done badly, with everyone working in a shared default workspace, governance will be fragmented and unravelling the mess will be worse than walking into a multi-cloud environment with no tagging taxonomy. - Structured Cost Reporting
Clear cost reporting starts with identity and routing. With separate Claude workspaces for teams, issue workspace-scoped API keys or federated identities so every request is tied to a defined owner or cost centre.
To go further, route traffic through a central gateway to enforce authentication, apply budgets and capture usage data consistently. Additionally, Claude’s Usage & Cost API can be used to report back on spend by workspace, API key, or user account.
With cost reporting in place, you can manage spend within those workspaces by setting spend and rate limits. - Rate Limits
Rate limiting protects engineering velocity by controlling how fast Claude can be consumed by your teams, not whether it can be used at all. It sets a maximum number of requests a workspace can make over a defined period, helping prevent runaway token consumption from parallel sessions or agentic workflows.
A common question from customers is: “How do we know we’re spending tokens on valuable work?”
Rate limits pushes cost accountability left. Rather than letting teams burn through tokens freely, they place a constraint on throughput, forcing engineers to prioritise high-value activity.
The risk in token costs is rarely one big prompt – it's lots of low-value activity (oversized context, unnecessary background agents, or premium models for routine work). If engineers know throughput is capped, they are forced to prioritise valuable work with intentional usage. - Spend Limits
Spent limits create a financial safety net by capping how much a workspace can spend in a month. Unlike rate limits, which shape consumption in real time, spend limits manage total cost exposure by stopping spend from exceeding agreed thresholds. This is especially useful for budgeting or containing risk while adoption grows, providing a hard backstop that prevents uncontrolled cost.
Spend limits are the most familiar cost-control lever because once the budget is spent, usage stops. But, without understanding the value usage is bringing, perhaps increasing # of PRs or increasing the lifespan of deployed code, this can have a negative impact on engineering productivity. - Committed Usage Discounts
Committed usage discounts should be approached as a negotiation lever, not a published entitlement.
According to public API pricing, costs are published as pay-as-you-go token rates by model, with no official or generally available committed usage discount programme.
However, Anthropic clearly has an enterprise sales motion. If token volumes are material, you should approach discussions with a view of demand forecasting and a target discount – similar to how organisations already engage with AWS, Azure and GCP commitments. - Session Hygiene
Session hygiene is the main optimisation lever that improves quality and reduces costs at the same time.
Claude becomes expensive when engineers let context sprawl with long conversations, oversized prompts, repeated file reads etc. This is double jeopardy as this also degrades performance as the LLM has more irrelevant material to work through.
As an engineer, cleaner sessions mean better outputs. Claude stays focused, follows instructions more accurately, and is less likely to get lost in stale context.
For example, starting a new session when switching tasks avoids unnecessary context build-up and reduces token usage.
Markdown files are a big part of making this work. CLAUDE.md provides persistent project context at the start of every session, removing the need to repeatedly explain build commands, coding standards, or architecture patterns. - Model / Task Rightsizing
Engineers often default to the most capable model due to convenience, but this is rarely cost-efficient.
Using Opus 4.8 for every task is the cloud equivalent of spinning up the biggest EC2 instance to run a ‘hello-world’ app.
Anthropic is clear that Opus is the most capable model for complex work whereas Sonnet is for daily coding and Haiku for simpler, faster tasks. Pricing makes these trade-offs very visible. Opus 4.8 is $5 input / $25 output per MTok, Sonnet 4.6 is $3 / $15, and Haiku 4.5 is $1 / $5.
Routine sessions should not automatically run on Opus 4.8. Reserve premium reasoning for premium work. Use Opus when the task genuinely justifies it, such as complex architecture, high stake debugging or multi-step agentic problem solving. But for the majority of day-to-day engineering workflows, start with Sonnet and upgrade if the task proves it needs more intelligence.
With rate and spend limits in force, engineers have to actively justify model choices. It’s simple supply and demand. The supply is limited so engineers must balance the demand of their tasks and adjust their Claude sessions to get maximal productivity out of their spend or rate limits. - Terminal Cost Tracking
To help engineers make the right cost vs value decisions, terminal cost tracking is an effective way to monitoring token consumption in real-time rather than after the monthly Anthropic bill arrives.
Using the /usage command, engineers can see session-level token consumption and remaining quota. This matters because, as with FinOps in cloud environments, cost awareness directly changes behaviour.
Engineers can identify inefficient sessions, oversized contexts, or unnecessary background activity – and adjust accordingly.
Summary
Claude is here to stay. Once engineers integrate it into their workflows, rolling it back is not realistic.
Optimising Claude without impacting engineering velocity is not about restricting adoption – it’s about applying the same discipline used in cloud and software spend management. The organisations that succeed will not be the ones who tell engineers to use AI less. They will be the ones that instrument and manage it this correctly from day one so adoption scales cost-efficiently.