Turning Tokens into ... BZZT! Limit hit!

Turning Tokens into ... BZZT! Limit hit!

Claude Code users have been fighting their tokens. Even on a big subscription plan, folks are hitting limits faster than ever. This morning I accidentally killed my five-hour quota in just one hour. How did we get here?

It’s Not Just Your Imagination

Peak-hour Burn-Rates Changed

On March 26, Anthropic engineer Thariq Shihipar announced on X that 5-hour session limits now drain faster during weekday peak hours (5am–11am PT). Your weekly limit stays the same, but the system draws down your daily budget more aggressively when demand is highest. Anthropic says about 7% of users will hit limits they wouldn’t have before. If you’re a dev working US business hours, you’re probably in that 7%.

The Double-Usage Promotion Ended

Since March 13, Anthropic had been running a quiet promotion that doubled usage limits during OFF-peak hours. It ended March 27. If you’d gotten used to the extra headroom without realizing it was temporary, the return to normal felt like cold water in the face.

And Bugs in Claude Code?

After the Claude Code source was accidentally published to npm on March 31, researchers found two issues that might be creating problems:

  1. Autocompact retry loop. The autocompact feature — which summarizes old context when sessions grow long — had an infinite retry loop on failure. Anthropic’s data showed 1,279 sessions with 50+ consecutive failures, totaling ~250,000 wasted API calls per day. The fix was three lines of code.

  2. Cache invalidation on resume. Using --resume or --continue injected tool attachments at a different position than a fresh session, invalidating the entire conversation cache. Uncached tokens cost 10–20x more against your quota. A single --continue on a long session could obliterate your budget.

Anthropic acknowledged there are issues and I expect to see some of these things change, but the availability and cost of tokens is going to be the key issue for the future of software development.

You Can Spend Them Wisely

Model Choice Makes a Huge Difference

I have a workflow for software development that I love. It uses many agents in parallel to revise and improve my plans, implementation, and testing/refactoring. And those agents need context, they do work, they write reports – each step chewing up tokens. If you’re on Claude Code and using Opus for everything, you’re spending tokens like a lottery winner.

Much of the work you’re doing really doesn’t need Opus’ deep thinking. I added instructions to my CLAUDE.md that tell Claude to default down:

Use Haiku for: file searches, grep, running tests, git history, reading files,
simple code generation with clear specs.

Use Sonnet for: planning, code review, refactoring, writing tests that require
understanding business logic, debugging.

Use Opus only for: architectural decisions, complex multi-file implementations,
or when a cheaper model already failed.

All tokens are not created, or charged, equally!

Opus output costs 5x Haiku on raw price alone. But Opus also generates more tokens per task — deeper thinking, longer responses. For something like grepping a codebase, you might spend 10–15x more on Opus for the same result. Five Haiku agents running in parallel cost a fraction of one Opus agent doing the same work sequentially — and the Haiku agents finish faster.

This one simple tweak can dramatically drop your burn rate.

Work Smarter Not Harder

Avoid --resume and --continue

You did some great work in that previous session, but you probably don’t need all the context. Start fresh sessions instead. You avoid the cache invalidation bug that can 10x your token burn. Carry context forward via CLAUDE.md instead. When you’re finishing a work session, ask Claude to write notes and plans for next time. Loading that intentional, curated context is way cheaper.

Stay Focused

Long sessions accumulate context that has to be reprocessed or autocompacted. Finish a task, start a new session. Think of sessions like branches — small and focused beats long-running. Running several focused sessions in parallel is usually cheaper for your token burn than one monolithic session.

Be Constantly Updating Your CLAUDE.md

Put project context, conventions, and decisions into your CLAUDE.md file instead of building up massive conversation histories. Claude reads it at the start of every session, keeping sessions shorter and cache hit rates higher.

Adapt Your Timing

The 5am–11am PT window is when the throttle bites hardest. Use early morning or evening sessions for the hard stuff – analysis, building plans, abstract thinking – and save the easier bits for the peak hours. If you can Haiku/Sonnet your way through peak hours, you’ll get way more value.

The Near Future

Anthropic told TechCrunch that paid subscriptions more than doubled already in 2026. That’s a lot of new users hitting the same infrastructure in a very short window. The throttle, the bugs, the downtime on the API, the promotion-then-pullback — they all point to a system under more load than it’s really ready for.

It’ll get better, but it won’t get free. Anthropic has strong incentives to get it right. But right now, if you’re a Claude Code power user, you’ve got to be strategic with your work and processes to manage token burn.

Keep building.


Jeff Casimir
Jeff Casimir
Principal, Jumpstart Lab
jeff@jumpstartlab.com

Does your team need help rolling Claude Code into everyday work? Through workshops and coaching, I can help them reach their potential.

Book a Conversation