Claude has gotten noticeably worse for me too. It goes into long exploration loops for 5+ minutes even when I point it to the exact files to inspect. Then 30 minutes later I hit session limits. Three sessions like that in a day, and suddenly 25% of the weekly limit is gone.
I ended up buying the $100 Codex plan. So far it has been much more generous with usage and more accurate than Claude for the kind of work I do.
That said, Codex has its own issues. Its personality can be a bit off-putting for my taste. I had to add extra instructions in Agents.md just to make it less snarky. I was annoyed enough that I explicitly told it not to use the word “canonical.”
On UI/UX taste, I still think current Codex is behind the Jan/Feb era of Claude Code. Claude used to have much better finesse there. But for backend logic, hard debugging, and complex problem-solving, Codex has been clearly better for me. These days I use Impeccable Skillset inside Codex to compensate for the weaker UI taste, but it still does not quite match the polish and instinct Claude Code used to have.
I used to be a huge Claude Code advocate. At this point, I cannot recommend it in good conscience.
My advice now is simple: try the $20 plans for Codex and Cursor, and see which one matches your workflow and vibes best
I had a weird experience at work last week where Claude was just thinking forever about tasks and not actually doing anything. It was unusable. The next day it was fine again.
Ya I've had this experience more than a few times recently. I've heard people claiming they are serving quantized models during high loads, but it happens in cursor as well so I don't think it's specific to Anthropics subscription. It could be that the context window has just gotten into a state that confuses the model... But that wouldn't explain why it appears to be temporary...
My best guess is this is the result of the companies running "experiments" to test changes. Or it's just all in my head :)
Not the guy you're responding to, but when this happens the token counter is frozen at some low value (eg. 1k-10k) value as well, so it's not thinking in circles but rather not thinking (or doing anything, for that matter) at all.
This happened to me as well! It was especially infuriating because I had just barely upgraded to the $200 per month plan because I exhausted my weekly quota. Then the entire next day was a complete bust because of this issue. I want my money back!
I'm using the Codex Business subscription (about 30€) already for multiple months. Even there they cut back on the quota. A few months back it was hard for me to reach the limit.
Now it is easier.
Still, in comparison with Claude Code, the quota of Codex is a much better deal.
However, they should not make it worse...
Promotion has been extended til May 31st for the $100 and $200 subs.
At the same time, they’ve been giving out a ton of additional quota resets seemingly every other week (and committed to an additional reset for every million additional users until they hit 10mil on codex).
So they’ve really set a high bar for people’s expectations on their quota limits.
Once they drop the 2x promotion for good and stop the frequent resets, there are going to be a lot of complaints.
> Claude has gotten noticeably worse for me too. It goes into long exploration loops for 5+ minutes even when I point it to the exact files to inspect.
This is what I'm working on proving now.
It is more that there is a confidence score while thinking. Opus will quit if it is too high and will grind on if the confidence score is close to the real answer. Haiku handles this well too.
If you give Sonnet a hard task, it won't quit when it should.
Nonetheless, that issue has been fixed with Opus.
I'll try to show that the speed of using Opus on tasks that have medium to hard difficultly is consistently the same price or cheaper than running them with Haiku and Sonnet. While easier tasks, the busy work that is known, is cheaper run with Haiku.
> It goes into long exploration loops for 5+ minutes even when I point it to the exact files to inspect.
Give it a custom sandbox and context for the work, so it has no opportunity to roam around when not required. AI agentic coding is hugely wasteful of context and tokens in general (compared to generic chat, which is how most people use AI), there's a whole lot of scope for improvement there.
The sandbox is fine, but if the parent has given explicit instruction of files to inspect, why is it not centering there? Is the recent breakage that the base prompt makes it always try to explore for more context even if you try to focus it?
Because the "explicit instruction" you give AI is not deterministic as in a normal computer program. It's a complete black box and the context is also most likely polluted by all sorts of weird stuff. Putting it on as tight of a leash as possible should be seen as normal.
They changed plan mode so that it's instructed to follow a multi-step plan, the first step being to explore the code base. When you tell it to focus it's getting contradictory instructions from plan mode vs your prompt and it's essentially a coin flip which one it picks.
It does seem like a cynical attempt to make more money.
It was pretty much first for CLI agents and had a benchmark that was the go to at the start of LLM coding. Now the benchmark doesn't get updated and aider never gets a mention in talking about CLI tools till now.
Codex has been better for me, but it's WAY too nitpicky/defensive. It always wants to make changes that add complexity and code to solve a problem that's impossible to happen (e.g. a multiprocess race condition on a daemon I only ever run one instance of).
I skimmed the issue. No wonder Anthropic closes these tickets out without much action. That’s just a wall of AI garbage.
Here’s what I’ve done to mostly fix my usage issues:
* Turn on max thinking on every session. It save tokens overall because I’m not correcting it of having it waste energy on dead paths.
* keep active sessions active. It seems like caches are expiring after ~5 minutes (especially during peak usage). When the caches expire it sees like all tokens need to be rebuilt this gets especially bad as token usage goes up.
* compact after 200k tokens as soon as I reasonably can. I have no data but my usage absolutely sky rockets as I get into longer sessions. This is the most frustrating thing because Anthropic forced the 1M model on everyone.
The problem is actually because their cache invalidates randomly so that's why replaying inputs at 200k+ and above sucks up all usage. This is a bug within their systems that they refuse to acknowledge.
They also silently raised the usage input tokens consume so it's a double whammi.
I'm afraid the music may be slowly fading at this party, and the lights will soon be turned on. We may very well look back on the last couple years as the golden era of subsidized GenAI compute.
For those not in the Google Gemini/Antigravity sphere, over the last month or so that community has been experiencing nothing short of contempt from Google when attempting to address an apparent bait and switch on quota expectations for their pro and ultra customers (myself included). [1]
While I continue to pay for my Google Pro subscription, probably out of some Stockholm Syndrome, beaten wife level loyalty and false hope that it is just a bug and not Google being Google and self-immolating a good product,
I have since moved to Kiro for my IDE and Codex for my CLI and am as happy as clam with this new setup.
For what it’s worth, that was pretty obvious from the get go it wasn’t a realistic long term deal. I’ve been building all the libraries I hoped existed over the past 1-2y to have something neat to work with whenever the free compute era ends. I feel that’s the approach that makes sense. Take the free tokens, build everything you would want to exist if you don’t have access to the service anymore. If it goes away you’re back to enjoying writing code by hand but with all the building blocks you dreamt of. If it never goes away, nothing wasted, you still have cool libs
So, antigravity will definitely quickly eat up your pro quota. You can run out of it in an hour (at least on the $20/mo plan) and then you'll be waiting five days for it to refresh.
However, I've found that the flash quota is much more generous. I have been building a trio drive FOC system for the STM32G474 and basically prompting my way through the process. I have yet to be able to run completely out of flash quota in a given five hour time window. It is definitely completing the work a lot faster than I could do myself -- mainly due to its patience with trying different things to get to the bottom of problems. It's not perfect but it's pretty good. You do often have to pop back in and clean up debris left from debugging or attempts that went nowhere, or prompt the AI to do so, but that's a lot easier than figuring things out in the first place as long as you keep up with it.
I say this as someone who was really skeptical of AI coding until fairly recently. A friend gave me a tutorial last weekend, basically pointing out that you need to instruct the AI to test everything. Getting hardware-in-loop unit tests up and running was a big turning point for productivity on this project. I also self-wired a bunch of the peripherals on my dev board so that the unit tests could pretend to be connected to real external devices.
I think it helps a lot that I've been programming for the last twenty years, so I can sometimes jump in when it looks like the AI is spinning its wheels. But anyway, that's my experience. I'm just using flash and plan mode for everything and not running out of the $20/mo quota, probably getting things done 3x as fast as I could if I were writing everything myself.
Ultimately we'll find more efficient techniques and hardware and AI companies will end up owning Nuclear Power Stations and continue providing models capable of 10x of what they are now.
Valuation have already reached point where these companies can run their nuclear power station, fund developement of new hardware and techniques and boost capabilities of their models by 10x
Ads do not pay enough to cover AI usage. People see the big numbers Google and Facebook make in ads and forget to divide the number by the number of people they serve ads to, let alone the number of ads they served to get to that per-user number. You can't pay for 3 cents of inference with .07 cents of revenue.
You also can't put ads in code completion AIs because the instant you do the utility to me of them at work drops to negative. Guess how much money companies are going to pay for negative-value AIs? Let's just say it won't exactly pay for the AI bubble. A code agent AI puts an ad for, well, anything and the AI accidentally puts it into code that gets served out to a customer and someone's going to sue. The merits of the case won't matter, nor the fact the customer "should have caught it in review", the lawsuit and public reputation hit (how many people here are reading this and salivating at the thought of being able to post an angrygram about AIs being nothing but ad machines?) still cost way too much for the AI companies creating the agents to risk.
The evidence is that quotas exist, as seen here, and are low enough that people are hitting them regularly. When was the last time you hit your quota of Google searches? When was the last time you hit your quota of StackOverflow questions? When was the last time you hit your quota of YouTube videos? Any service will rate limit abuse, but if abuse is indistinguishable from regular use from the provider's perspective, that's not a good sign.
Can confirm, I initially enjoyed the 5-hour limits on Gemini CLI and Antigravity so much that I paid for a full year, thinking it was a great decision
In the following months, they significantly cut the 5-hour limits (not sure if it even exists anymore), introduced the unrealistically bad weekly limit that I can fully consume in 1-2 hour, introduced the monthly AI credits system, and added ads to upgrade to Ultra everywhere
At the very least the Gemini mobile app / web app is still kinda useful for project planning and day-to-day use I guess. They also bumped the storage from 2TB to 5TB, but I don't even use that
It should be illegal to change the terms of the subscription mid-period. If you paid for the full year, you should get that plan for the whole year. I don't understand how it's ok for corporations to just change the terms mid-way, and we just have to accept it.
I'm also hitting the limits in a day when it would last the entire week. The service is literally worth 4x to 6x less. Imagine I go to my favorite restaurant and I pay the same for 1/5th of the food. Bye bye, you have to vote with your wallet.
The response doesn't even make sense and appears to be written by AI.
> The March 6 change makes Claude Code cheaper, not more expensive. 1h TTL for every request could cost more, not less
Feels very AI.
> Restore 1h as the default / expose as configurable? 1h everywhere would increase total cost given the request mix, so we're not planning a global toggle.
They won't show a toggle because it will increase costs for some unknown percentage of requests?
Sounds like a decision I would make when memory is expensive and you want to get rid of the very long (in time) tail of waiting 1h to evict cache when a session has stopped.
There must be a better way to do this. The consumer option is the pricing difference. If they’d make cache writes the same price as regular writes, that would solve the whole problem. If you really want to push it, use that pricing only for requests where number of cache hits > 0 (to avoid people setting this flag without intent to use it), and you solved the whole issue.
Maybe scared wasn't the best word... but we cannot deny Opus is a great - if not greatest - model at coding and Anthropic is the only one serving it a reasonable prices when going through their subscription model.
When a casino is making a lot of money from gamblers, they don't care about their customers losing money, given the machines are rigged against you.
Anthropic sells you 'knowledge' in the form of 'tokens' and you spend money rolling the dice, spinning the roulette wheels and inserting coins for another try. They later add limits and dumb down the model (which are their gambling machines) of their knowledge for you to pay for the wrong answers.
Once you hit your limit or Anthropic changes the usage limits, they don't care and halt your usage for a while.
If you don't like any of that, just save your money and use local LLMs instead.
I don't get it. Last week on the 100 bucks plan I generated probably 50k LOC (not a quality measure for sure!) and just barely kissed the weekly limit. I did get rate limited on some sessions for sure, but that's to be expected.
I'm curious what are people doing that is consuming your limits? I can't imagine filling the $200 a month plan unless I was essentially using Claude code itself as the api to mass process stuff? For basic coding what are people doing?
Yea, I found myself maxing out the $20/mo plan occasionally, so I tried the $100/mo, but I don't think I even once even approached the session limit, let alone the weekly limit. And this is doing what I would consider heavy, continuous programming. I probably ought to go back down to $20 one. It would be nice if they had a cheaper tier in between them, but the tiers they have are probably a good business trick to get people to buy much more than they need.
This is the problem most people are facing. Before March, I had hit the rate limit as single time. That involved security audit of our entire code base from a few different angles.
As of now, I’m consistently hitting my 5 hour limit in less than 1 hour during N/A business hours. I’m getting to the point where I basically can’t use CC for work unless I work very early or late in the day.
I don't hit limits either on $100, it's more that claude-code seems to be constantly broken and they added some vague bullshit about not using claude-code before 2pm so I just don't expect it to work anymore and tend to use codex-cli as my driver nowadays. I also never hit limits in codex but... codex is $20/mo not $100/mo so it's making me consider relocating the $100 I spend to Anthropic as play money for z.ai and other tools. I think claude-code has great training wheels (codex does not) but once the training wheels come off, and claude-code becomes as unreliable as it has been then it makes you consider alternatives.
Anthropic is going through major growing pains, both technical and organizational. The left hand doesn't know what the right hand is doing. It's chaos, things are changing too quickly, and us users are getting caught in the middle of it.
Think Twitter's fail-whale problems. Sometimes you are lucky, sometimes you aren't. Why? We won't know until Anthropic figures it out and from the outside it sure looks like they're struggling.
I have the same experience as you. I’m wondering if it is regional? I’m in Europe so don’t overlap much with US usage, which is likely to be way higher
Also in Europe and can only agree. Granted I'm on the 20x plan, but I have yet to hit a limit once and I'm using Claude 12h+ per day on multiple projects.
What does it look like when you get rate limited? Does the instance just kind of sit and spin?
I suspect I was getting rate limited very aggressively on Thursday last week. It honestly infuriated me, because I'm paying $200 a month for this thing. If it's going to rate limit me, at least tell me what it's doing instead of just making it seem like it's taking 12 hours to run through something that I would expect to be 15 minutes. The worst part is that it never even finished it.
My gut feeling is this is not enough money for them by far (not to mention their investors), and we'll eventually get ratcheted up inline with dev salaries. E.g. "look how many devs you didn't have to hire", etc.
People need to understand a few things: vague questions make the models roam endlessly “exploring” dead ends. “Restarting” old chats immediately eats a lot of context.
Anthropic CAN change their limits and rates as they see fit, there’s never been hard promises or SLOs on these plans.
With that said, I pay the Pro subscription (20/mo) and I hit limits maybe 2/3 times over a period of 4 months building a simple running app in Python. I’d not call it production ready but it’s not nothing either.
If people were considerably more willing to aggressively prune their context and scope tasks well, they could get a lot more done with it, at least in my experience. Anthropic can’t really fix anything because the underlying model architecture can’t be “patched”. But I definitely feel a lot of people can’t wrap their heads around the new paradigms needed to effectively prompt these models.
Additionally, opting out is always an option… but these types of issues feel more like laziness than real, structural issues with the model/harness…
> Anthropic CAN change their limits and rates as they see fit, there’s never been hard promises or SLOs on these plans.
No they can't. When I buy an annual subscription and prepay for the year, they can't just go "ok now you get one token a month" a day in. I bought the plan as I bought it. They can't change anything until the next renewal.
Yes: Claude Code “consumes tokens” and starts a session when the computer is asleep without anything started. Or consumes 10% of my session for “What time is it?”
Something similar is happening with GitHub Copilot too. It's impossible to know what a "request" is and some change in the last couple of months has seen my request usage go up for the same style of work. Toss in the bizarre and impossible to understand rate limiting that occurs with regular usage and it's pretty obvious that these companies are struggle to scale.
I'm finding the oppostire with copilot. A request is a prompt, with some caveats around whats generating the prompt. I am quite happily working with opus 4.6 at 3x cost and about 1/3 oor the month in I'm stting at ~25% usage of a pro+ subscription. I find it quite easy to track my usage and rate of usage.
The overall context windows are smaller with copilot I believe, but it dfoesnt appear to be hurting my work.
I'm using it for approx 4 hours a day most days. Generally one shotting fun ideas I thoroughly plan out in planning mode first, and I have my own verison of the idea->plan->analyse-> document implementation phases -> implement via agent loop. simulations, games, stuff-im-curious about and resurrecting old projects that never really got off the ground.
Switched back to codex for the promotion. Opus at the start of the year was GOAT- just relentless at chewing through hard problems. Now it spins on pretty easy work (took three swings just to edit a ts file) and my session is like 1-3 prompts (downgraded to the $20 plan but still)
It’s been unusable for me as my daily coding agent. I run out of credits in the pro account in an hour or so. Before that I had never reached the session limit. Switched back to Junie with Gemini/chatgpt.
Once people won't be able to think anymore and business expect the level of productivity witnessed before, will have no choice but cough up whatever providers bill us.
>and business expect the level of productivity witnessed before, will have no choice but cough up whatever providers bill us.
Is that bad? After all, even if they hiked to price infinity, you wouldn't worse off than if AI didn't exist because you could still code by hand. Moreover if it's really in a "business" (employment?) context, the tools should be provided by your employer, not least for compliance/security reasons.
Didn't they move too soon then? People haven't forgotten how to tie their shoelaces (yet). And anyway, they'll just move to a different model; last holdout wins.
"enshittification" gets thrown around a lot, but this is the exact playbook. Look at the previous bubble's cash cow: advertising.
Online advertising is now ubiquitous, terrible, and mandatory for anyone who wants to do e-commerce. You can't run a mass-market online business without buying Adwords, Instagram Ads, etc.
AI will be ubiquitous, and then it will get worse and more expensive. But we will be unable to return to the prior status quo.
I'm on the Free tier using Claude exclusively for consultation (send third party codebase + ask why/where is something done). I also used to struggle to hit limits. Recently I was able hit the limit after a single prompt.
I extensively used Claude till now and just tested Genini 3.1 pro yesterday via AI studio. In gemini cli, they don't offer this, i don't know why?
Taking a second opinion has significantly helped me to design the system better, and it helped me to uncover my own and Claude blindspots.
Also, agree that, it spent and waist a lot of token on web search and many a times get stuck in loop.
Going forward- i will always use all 3 of them. Still my main coding agent is Claude for now.. but happy to see this field evolving so fast and it's easy to switch and use others on same project.
No network effects or lock in for a customer. Great to live in this period of time.
The entire goal is to be token efficient (over 50% cheaper), and by extension, take advantage of LLM's better reasoning at shorter context lengths
This really started as an internal side project that made me more productive, I hope it will help others too. Apache 2.0
Currently it still can't compete the subsidized coding plan rates using Anthropic API pricing though (even though it beats CC while both use API key), which tells me that all subscription plan operators are losing money on such plans
It's very easy to calculate the actual cost given they list the exact tokens used. If I take the AWS Bedrock pricing for Opus 4.6 1M context (because Anthropics APIs are subsidized and sold at a loss), here's what each costs:
Cache reads cost $0.31
Cache writes cost $105
Input tokens cost $0.04
Output tokens cost $28.75
The total spent in the session is $134.10, while the Pro Max 5x subscription is $100.
Even taking the Anthropics API pricing, we arrive at $80.58. Below the subscription price, but not by much.
It's just the end of the free tokens, nothing to see here.
I've been feeling the squeeze too. I've tried switching between different models as a test, I can at least say it feels like the limits are about half of what they used to be a few months ago. I'd be totally willing to concede that this is just my perception if Anthropic would only release some tools for measuring your usage.
In theory the /stats command tells you how many tokens you've used, which you could use to compute how much you are getting for your subscription, but in practice it doesn't contain any useful info, it may be counting what is printed to the terminal or something - my stats suggest my claude code usage is a tiny amount of tokens, but they must be an extremely underestimated token count, or they are charging much more for the subscription than the API per token (which is not supposed to be the case).
Last week's free extra usage quota shed some light on this. It seems like the reported tokens are probably are between 1/30th to 1/100th of the actual tokens billed, from looking at how they billed (/stats went up 10k tokens and I was billed $7.10). With the API it should be $25 for a million tokens.
I've got a dual path system to keep costs low and avoid TOS violations.
For general queries and investigation I will use whatever public/free model is available without being logged in. Not having a bunch of prior state stacked up all the time is a feature for me. This is essentially my google replacement.
For very specific technical work against code files, I use prepaid OAI tokens in VS copilot as a "custom" model (it's just gpt5.4).
I burn through maybe $30 worth of tokens per month with this approach. A big advantage of prepaying for the API tokens is that I can look at everything copilot is doing in my usage logs. If I use the precanned coding agent products, the prompts are all hidden in another layer of black box.
I had Max plan and never reached its limit despite constantly working. Now I use the Pro plan and regularly reach the 5h limit as well as the weekly limit, as expected.
I found that it makes a huge difference if you provide clear context when developing code. If you leave open room for interpretation, Claude Code uses tokens up much faster than in a defined context.
The same is true for his time to answer getting longer if there isn't much documentation about the project.
Nope. It has become much much slower for me as well. It’s weird cause at times I will get a response very quickly, like it used to be. But most of the time I have to wait quite a bit for the simplest tasks.
For whoever else is having the same problems, worth voting these kind of issues. There needs to be more transparency over what goes on with our subscriptions.
We vote here on HN and it's much more effective. Anyone from Anthropic reading conversations on HN like this one can be scared. We'll jump ship if they don't address such glaring issues.
There are MANY accounts of claude degradation (intelligence, limits) over the past week on reddit and here with many posts describing people moving. Nothing is changing. You'd think they'd at least give a statement.
The nice iOS app is a big convenience for me, but I’m starting to think I should just put my $20 in Open Router. It seems like minimax is a pretty solid competitor. I’m curious if the US-centric “frontier” is just marketing.
imo that’s what I’m doing. Trialing the Hermes harness since I can hook it up to signal. StepFun 3.5 Flash for general assistant stuff and Kimi/Minimax for software development
Unless the agent code is open-sourced, there is hardly any transparency in how the agent is spending your tokens and how does it calculate the tokens. It's like asking your lawyer why they charged some amount.
Yeah perplexity used to be great but they've also clamped down on the 20€ plan. Only one deep research query was enough to block me until the end of the month.
The thing is, if it's going to be this expensive it's not going to be worth it for me. Then I'll rather do it myself. I'm never going to pay for a €100 subscription, that's insane. It's more than my monthly energy bill.
Maybe from a business standpoint it still makes sense because you can use it to make money, but as a consumer no way.
"Hey Claude, can you help me create a strategy to optimize my token use so I don't run into limits so often?" --> worked for me! I had two $200 plans before and now I am cool despite all day use
I had used Claude Code max as my daily driver last year and this sort of drama was par for the course. It's why I migrated entirely to Codex, despite liking Claude, the harness, more.
There's this honeymoon period with Claude you experience for a month or two followed by a trough of disillusionment, and then a rebound after a model update (rinse and repeat). It doesn't help that Anthropic is experiencing a vicious compute famine atm.
I've been using Code for half a year, these past couple weeks have been a totally different experience I'm on max 20, and seeing my weekly quota going bust in ~3 days is a bit absurd when nothing has significantly changed in the way I work
It’s further frustrating that I have committed to certain project deadlines knowing that I’d be able to complete it in X amount of time with agent tooling. That agentic tooling is no longer viable and I’m scrambling to readjust expectations and how much I can commit to.
I refuse to use anthropic's models (and openai, gemini) because the math simply doesn't add up.
To add the fact we are being taken for fools with dramatic announcements, FOMOs messages. I even suspect some reaction farms are going on to boost post from people boasting Claude models.
These don't happen for codex. Nor for mistral. Nor for deepseek. It can't just be that Claude code is so much better.
There are open weight models that work perfectly fine for most cases, at a fraction of the cost. Why are more people not talking about those. Manipulation.
Mistral isn't that great. Deepseek was good when they first had thinking. But most people just try something out and if that doesn't work on that model then it's bad and for Claude and Codex and Gemini they just are that much better now, but if they quantize or cut limits they destabilize and you're right you might as well just use something worse but reliable.
I regularly compare models. You are right Deepseek was more impressive when the latest came out. But since then they accepted to slow down throughout and keep the same quality.
I often compare with Gemini. Sure those Google servers are super fast. But I can't see it better. Qwen and deepseek simply work better for me.
Haven't tested Mistral in a while, you may be right.
People try out and feel comfortable: using U.S models (I can see the logic), but mostly for brand recognition. Anthropic and OpenAi are the best aren't they? When the models jam they blame themselves.
How good are local LLMs at coding these days? Does anyone have any recommendations for how to get this setup? What would the minimum spend be for usable hardware?
I am getting bored of having to plan my weekends around quota limit reset times...
Get a second hand 3090/4090 or buy a new Intel Arc Pro B70. Use MoE models and offload to RAM for best bang for your buck. For speed try to find a model that fits entirely within VRAM. If you want to use multiple GPUs you might want to switch to vLLM or something else.
Besides some of the obvious hacks to reduce token usage, properly indexed code bases (think IntelliJ) reduce token usage significantly (30%-50%, while keeping or exceeding result quality compared with baseline) as shown with https://github.com/ory/lumen
Anthropic is not incentivized to reduce token use, only to increase it, which is what we are seeing with Opus 4.6 and now they are putting the screws on
A little off topic, but did Anthropic distill from an older OpenAI model? All the sudden over the last few days I'm getting a ton of em dashes in claude code responses!
My personal experience is way different: I struggle to burn through more than 50% of the 5 hour limit
For context, with Google AI Pro, I can burn through the Antigravity weekly limit in 1-2 hours if I force it to use Gemini 3.1 Pro. Meanwhile Gemini 3 Flash is basically unlimited but frequently produces buggy code or fail to implement things how I personally would (felt like it doesn't "think" like a software dev)
I also tried VS Code + Cline + OpenRouter + MiniMax M2.7. It's quite cheap and seems to be better than Gemini 3 Flash, but it gets really pricy as the context fills up because prompt caching is not supported for MiniMax on OpenRouter. The result itself usually needs 3-6 revisions on average so the context fills up pretty often
Eventually I got Claude Max 5x to try for a month. VS Code + Claude Code extension on a ~15k lines codebase, model set to "Default", and effort set to "Max". So far it's been really good: 0-2 revisions on average, and most of the time it implements things exactly how I would or better. And, like I said, I can only consume 40-60% of the 5-hour limits no matter how hard I try
Granted, I'm not forcing it to use Opus like OP (nor do I use complicated skills or launch multiple tasks at the same time), but I feel like they really nailed the right balance of when to use which model and how to pass context between the them. Or at least enough that I haven't felt the need to force it to use Opus all the time
They also need to fix the 30 second lag between submitting the request and actually starting to get tokens back - it used to be instant, and still is at work where we use Anthropic models via an enterprise copilot subscription.
It's a bit shocking to me how opaque the pricing for the subscription services by the frontier labs are. It's basically impossible for people to tell what they're actually buying, and difficult to even meaningfully report or compare experiences.
Yeah, I cancelled the moment I realized that the subscription is a scheme to get you to constantly dip into extra usage. I get more benefit out of Claude on the free tier than on Pro.
I think the right way to think of it, from a self-protective perspective, is this: the real offer is the per-token pricing. Use that for a while, and iff you are consistently spending more than $20/mo, treat the subscription offering as a discount on some of that usage. So only on that condition, try the subscription and see if your monthly costs go down (because of the short term rate limits, they may not, depending on your usage patterns!).
But the opacity itself is a bit offensive to me. It feels shady somehow.
Codex is my preferred, I use it at work. The whole "Department of War" fiasco was enough for me to say Goodbye to OAI for personal. I'm a Claude person now. It's about the same level of performance really.
We also experienced hitting our Claude limits much earlier than before during the last two weeks. Up to a degree where we were thinking it must be a bug.
Wasn't Antrophic previously offering double the token usage outside busy hours? Now they are counting tokens back at normal rate. But yeah, it's not good. I use codex because claude insists in peaking at and messing with folders and file outside its work area though
this same pattern seems to occur every time a new model is about to release. i didnt notice the usage problem - i am on 20x. but opus 4.6 feels siginificantly dumber for some reason. i cant qualitify it, but it failed on everyday tasks where it used to complete perfectly
Every time there is a new model coming I think they deteriorate the current. This happens every darn time. Opus 4.6 isn't as sharp, not even close to as it was few weeks ago.
I’m processing some images(custom board game images -> JSON) with a common layout and basic structure and I exhausted my quota after just 30 images(pleb Pro account). I have 700 images to process…
What I did instead is tune the prompt for gemma 4 26b and a 3090. Worked like a charm. Sometimes you have to run the main prompt and then a refinement prompt or split the processing into cases but it’s doable.
Now I’m waiting for anyone to put up some competition against NVIDIA so I can finally be able to afford a workstation GPU for a price less than a new kidney.
I feel like I am living in a bubble, no one seems to mention Antigravity in these discussions and I have not had any issues with Ultra subscription yet. It seems to go on forever and the Interface is so much better for dev work as compared to CC. (Though admittedly my experience with cc is limited).
I strongly believe google's legs will allow it to sustain this influx of compute and still not do the rug-pull like OAI or Anthropic will be forced to do as more people come onboard the code-gen use case.
I don't use Claude so this doesn't affect me, but I worry it will spoil the fun for me for following reason.
They inflated how much their tools burn tokens from day one pretty much,remember all the stupid research and reports Claude always wanted to do, no matter what you asked it. Other tools are much smarter so this is not such a big deal.
More importantly, these moves tend to reverberate in the industry, so I expect others will clamp down on usage a lot and this will spoil my joy of using AI without countring every token.
Burning tokens doesn't just wastes your allotment, it also wastes your time. This gave rise to turbo offering where you get responses faster but burn 2x your tokens.
I mean this is expected is it not? These companies burned unimaginable amounts of investor cash to get set up and now they have to start turning a profit. They can't make up for the difference with volume because the costs are high, so the only option is to raise prices.
I’ve moved away from Claude and toward open-source models plus a ChatGPT subscription.
That setup has worked really well for me: the subscription is generous, the API is flexible, and it fits nicely into my workflow. GPT-5.4 + Swival (https://swival.dev) are now my daily drivers.
Show us some reciepts in the form of a exported session. I've been a heavy user of Claude up untill the end of feb, but switched to Codex because it's better at handling large code bases, following the "plan", implementing the backend changes in Zig. If you ask Claude to do a review of the code and suggest fixes, then let it Codex review it, then again ask Claude, it will 99% of the time say. Oh yes you are right, let me fix that.
Either you are using it wrong or you are working in a totally different field.
Yeah it's much better, another plus is you can use it with OpenCode (or other 3rd party tools) so you can easily switch between Codex and most other models by alright companies (not Anthropic or Google).
I hit the limits on the lower tiers of Codex just as fast as with Claude. At the moment I'm cycling between Claude, Codex, GLM5.1, and Kimi. The latter two are getting good enough, though, that I can make things go really far by doing planning with Opus and then switching to one of the cheap models for execution.
That’s why I switched to Codex. It’s so much more generous and in my experience, just as good. Also, optimizing your setup for working with agents can easily make a 5x difference.
I don't understand Anthropic. Be consistent. Why do models deteriorate to shit, this is not good for workflows and or trust. What Opus 4.7 is gonna come out and again the same thing? Come on.
I spend full 20x the week quota in less than 10 hours. How is that possible? Well try to mass translate texts in 30 languages and you will hit limits extremely quick.
Constant complaints about Anthropic. Not much on OAI/Codex. It seems people should just use OAI and come back when they realize compute isn’t free elsewhere.
Demand is higher than supply it is just the start of bubble.
Everyone and their dog is burning tokens on stupid shit that would be freed up if they would ask to make deterministic code for the task and run the task. OpenAI, Anthropic are cutting free use and decreasing limits because they are not able to meet the demand.
When general public catches up with how to really use it and demand will fall and the today built supply will become oversupply that’s where the bubble will burst.
This is your regular friendly reminder that these subscriptions do not entitle you to any specific amount of usage. That "5x" is utterly meaningless because you don't know what it's 5x of.
This is by design, of course. Anyone who has been paying even the slightest bit of attention knows these subscriptions are not sustainable, and the prices will have to go up over time. Quietly reducing the usage limits that they were never specific about in the first place is much easier than raising the prices of the individual subscription tiers, with the same effect.
If you want to know what kind of prices you'll be paying to fuel your vibe coding addiction in a few years, try out API pricing for a bit, and try not to cry when your 100$ credit is gone in 2 days.
so basically the anthropic employee who responded says those 1h caches were writes were almost never accessed, so a silent 5m cache change is for our best interest and saves cost. (justifying why they did this silently)
however his response gaslights us because in the OPs opening post his math demonstrates this is not true, it shows reads 26x more so at least in his case the cache is not doing what the anthropic employee describes.
clearly we are being charged for less optimization here and being given the message (from my perspective by anthropic) that if you are in a special situation your needs don't matter and we will close your thread without really listening.
What also gives it away is the refusal to at least expose this TTL via parameter. In the same sentence as informing the 5m won't change since it's your interest.
It's also in the interest of the users to keep certain params private, we are meant to deduce that. Did you not ?
Why so many 'developers' complaining about Claude rate limiting them? You know you can actually....use local LLMs? instead of donating your money to Anthropic's casino?
I guess this is fitting when the person who submitted the issue is in "AI | Crypto".
Well there's no crying at the casino when, you exhaust your usage or token limit.
Some months ago, I created a software for this reason, it has no success, but the thing is that communities could reduce tokens consumption, not all is LLM, you can share things from API calls between agents. Even my idea was no success I think it is a good concept share things each others, if you have some interest it's called tokenstree.com
Claude has gotten noticeably worse for me too. It goes into long exploration loops for 5+ minutes even when I point it to the exact files to inspect. Then 30 minutes later I hit session limits. Three sessions like that in a day, and suddenly 25% of the weekly limit is gone.
I ended up buying the $100 Codex plan. So far it has been much more generous with usage and more accurate than Claude for the kind of work I do.
That said, Codex has its own issues. Its personality can be a bit off-putting for my taste. I had to add extra instructions in Agents.md just to make it less snarky. I was annoyed enough that I explicitly told it not to use the word “canonical.”
On UI/UX taste, I still think current Codex is behind the Jan/Feb era of Claude Code. Claude used to have much better finesse there. But for backend logic, hard debugging, and complex problem-solving, Codex has been clearly better for me. These days I use Impeccable Skillset inside Codex to compensate for the weaker UI taste, but it still does not quite match the polish and instinct Claude Code used to have.
I used to be a huge Claude Code advocate. At this point, I cannot recommend it in good conscience.
My advice now is simple: try the $20 plans for Codex and Cursor, and see which one matches your workflow and vibes best
I had a weird experience at work last week where Claude was just thinking forever about tasks and not actually doing anything. It was unusable. The next day it was fine again.
That happens to me all the time. My current working theory is when their servers are hammered there is a queueing system that invisible to end-users.
Ya I've had this experience more than a few times recently. I've heard people claiming they are serving quantized models during high loads, but it happens in cursor as well so I don't think it's specific to Anthropics subscription. It could be that the context window has just gotten into a state that confuses the model... But that wouldn't explain why it appears to be temporary...
My best guess is this is the result of the companies running "experiments" to test changes. Or it's just all in my head :)
Cursor one is back to Claude 4 or 3.5+ at best. Struggles to do things it did effortlessly a few weeks ago.
It’s not under load either it’s just fully downgraded. Feels more they’re dialing in what they can get away with but are pushing it very far.
Set MAX_THINKING_TOKENS to 0, Claude's thinking hardly does anything and just wastes tokens. It actually often performs worse than without thinking.
Not the guy you're responding to, but when this happens the token counter is frozen at some low value (eg. 1k-10k) value as well, so it's not thinking in circles but rather not thinking (or doing anything, for that matter) at all.
This exact thing is happening to me since yesterday. It comes back to life when I throw the whole session away.
This happened to me as well! It was especially infuriating because I had just barely upgraded to the $200 per month plan because I exhausted my weekly quota. Then the entire next day was a complete bust because of this issue. I want my money back!
What day was it?
Thursday starting mid to late morning, and ended Friday night (US timezone).
I'm using the Codex Business subscription (about 30€) already for multiple months. Even there they cut back on the quota. A few months back it was hard for me to reach the limit. Now it is easier.
Still, in comparison with Claude Code, the quota of Codex is a much better deal. However, they should not make it worse...
I have the exact opposite experience. I can run claude forever, my codex quota was done by Wednesday morning.
OpenAI had a promotion that gave everyone double their rate limits until April 2nd.
Promotion has been extended til May 31st for the $100 and $200 subs.
At the same time, they’ve been giving out a ton of additional quota resets seemingly every other week (and committed to an additional reset for every million additional users until they hit 10mil on codex).
So they’ve really set a high bar for people’s expectations on their quota limits.
Once they drop the 2x promotion for good and stop the frequent resets, there are going to be a lot of complaints.
> Claude has gotten noticeably worse for me too. It goes into long exploration loops for 5+ minutes even when I point it to the exact files to inspect.
This is what I'm working on proving now.
It is more that there is a confidence score while thinking. Opus will quit if it is too high and will grind on if the confidence score is close to the real answer. Haiku handles this well too.
If you give Sonnet a hard task, it won't quit when it should.
Nonetheless, that issue has been fixed with Opus.
I'll try to show that the speed of using Opus on tasks that have medium to hard difficultly is consistently the same price or cheaper than running them with Haiku and Sonnet. While easier tasks, the busy work that is known, is cheaper run with Haiku.
> It goes into long exploration loops for 5+ minutes even when I point it to the exact files to inspect.
Give it a custom sandbox and context for the work, so it has no opportunity to roam around when not required. AI agentic coding is hugely wasteful of context and tokens in general (compared to generic chat, which is how most people use AI), there's a whole lot of scope for improvement there.
But the problem is it used to not need that before. These days, you have to think twice before you summon a subagent.
The sandbox is fine, but if the parent has given explicit instruction of files to inspect, why is it not centering there? Is the recent breakage that the base prompt makes it always try to explore for more context even if you try to focus it?
Because the "explicit instruction" you give AI is not deterministic as in a normal computer program. It's a complete black box and the context is also most likely polluted by all sorts of weird stuff. Putting it on as tight of a leash as possible should be seen as normal.
They changed plan mode so that it's instructed to follow a multi-step plan, the first step being to explore the code base. When you tell it to focus it's getting contradictory instructions from plan mode vs your prompt and it's essentially a coin flip which one it picks.
It does seem like a cynical attempt to make more money.
> Claude has gotten noticeably worse for me too.
My experience is limited only to CC, Gemini-cli, and Codex - not Aider yet, trying different combinations of different models.
But, from my experience, CC puts everything else to shame.
How does Cursor compare? Has anyone found an Aider combination that works as well?
Is aider even a thing considered anymore?
It was pretty much first for CLI agents and had a benchmark that was the go to at the start of LLM coding. Now the benchmark doesn't get updated and aider never gets a mention in talking about CLI tools till now.
Aider is dead because it's pre function calling era of tech
Codex has been better for me, but it's WAY too nitpicky/defensive. It always wants to make changes that add complexity and code to solve a problem that's impossible to happen (e.g. a multiprocess race condition on a daemon I only ever run one instance of).
I skimmed the issue. No wonder Anthropic closes these tickets out without much action. That’s just a wall of AI garbage.
Here’s what I’ve done to mostly fix my usage issues:
* Turn on max thinking on every session. It save tokens overall because I’m not correcting it of having it waste energy on dead paths.
* keep active sessions active. It seems like caches are expiring after ~5 minutes (especially during peak usage). When the caches expire it sees like all tokens need to be rebuilt this gets especially bad as token usage goes up.
* compact after 200k tokens as soon as I reasonably can. I have no data but my usage absolutely sky rockets as I get into longer sessions. This is the most frustrating thing because Anthropic forced the 1M model on everyone.
The problem is actually because their cache invalidates randomly so that's why replaying inputs at 200k+ and above sucks up all usage. This is a bug within their systems that they refuse to acknowledge.
They also silently raised the usage input tokens consume so it's a double whammi.
Can’t you turn the 1M off with a /model opus (or /model sonnet)?
At least up until recently the 1M model was separated into /model opus[1M]
Everything starts to feel like AI slop these days. Including this comment.
I'm afraid the music may be slowly fading at this party, and the lights will soon be turned on. We may very well look back on the last couple years as the golden era of subsidized GenAI compute.
For those not in the Google Gemini/Antigravity sphere, over the last month or so that community has been experiencing nothing short of contempt from Google when attempting to address an apparent bait and switch on quota expectations for their pro and ultra customers (myself included). [1]
While I continue to pay for my Google Pro subscription, probably out of some Stockholm Syndrome, beaten wife level loyalty and false hope that it is just a bug and not Google being Google and self-immolating a good product, I have since moved to Kiro for my IDE and Codex for my CLI and am as happy as clam with this new setup.
[1] https://github.com/google-gemini/gemini-cli/issues/24937
For what it’s worth, that was pretty obvious from the get go it wasn’t a realistic long term deal. I’ve been building all the libraries I hoped existed over the past 1-2y to have something neat to work with whenever the free compute era ends. I feel that’s the approach that makes sense. Take the free tokens, build everything you would want to exist if you don’t have access to the service anymore. If it goes away you’re back to enjoying writing code by hand but with all the building blocks you dreamt of. If it never goes away, nothing wasted, you still have cool libs
So, antigravity will definitely quickly eat up your pro quota. You can run out of it in an hour (at least on the $20/mo plan) and then you'll be waiting five days for it to refresh.
However, I've found that the flash quota is much more generous. I have been building a trio drive FOC system for the STM32G474 and basically prompting my way through the process. I have yet to be able to run completely out of flash quota in a given five hour time window. It is definitely completing the work a lot faster than I could do myself -- mainly due to its patience with trying different things to get to the bottom of problems. It's not perfect but it's pretty good. You do often have to pop back in and clean up debris left from debugging or attempts that went nowhere, or prompt the AI to do so, but that's a lot easier than figuring things out in the first place as long as you keep up with it.
I say this as someone who was really skeptical of AI coding until fairly recently. A friend gave me a tutorial last weekend, basically pointing out that you need to instruct the AI to test everything. Getting hardware-in-loop unit tests up and running was a big turning point for productivity on this project. I also self-wired a bunch of the peripherals on my dev board so that the unit tests could pretend to be connected to real external devices.
I think it helps a lot that I've been programming for the last twenty years, so I can sometimes jump in when it looks like the AI is spinning its wheels. But anyway, that's my experience. I'm just using flash and plan mode for everything and not running out of the $20/mo quota, probably getting things done 3x as fast as I could if I were writing everything myself.
Ultimately we'll find more efficient techniques and hardware and AI companies will end up owning Nuclear Power Stations and continue providing models capable of 10x of what they are now.
Valuation have already reached point where these companies can run their nuclear power station, fund developement of new hardware and techniques and boost capabilities of their models by 10x
Lights on = Ads in your output. EOY latest; they can't keep kicking the massive costs down the road.
Ads do not pay enough to cover AI usage. People see the big numbers Google and Facebook make in ads and forget to divide the number by the number of people they serve ads to, let alone the number of ads they served to get to that per-user number. You can't pay for 3 cents of inference with .07 cents of revenue.
You also can't put ads in code completion AIs because the instant you do the utility to me of them at work drops to negative. Guess how much money companies are going to pay for negative-value AIs? Let's just say it won't exactly pay for the AI bubble. A code agent AI puts an ad for, well, anything and the AI accidentally puts it into code that gets served out to a customer and someone's going to sue. The merits of the case won't matter, nor the fact the customer "should have caught it in review", the lawsuit and public reputation hit (how many people here are reading this and salivating at the thought of being able to post an angrygram about AIs being nothing but ad machines?) still cost way too much for the AI companies creating the agents to risk.
Where is your evidence of this "massive cost"? Inference is massively profitable for both anthropic and openai. Training is not.
The evidence is that quotas exist, as seen here, and are low enough that people are hitting them regularly. When was the last time you hit your quota of Google searches? When was the last time you hit your quota of StackOverflow questions? When was the last time you hit your quota of YouTube videos? Any service will rate limit abuse, but if abuse is indistinguishable from regular use from the provider's perspective, that's not a good sign.
This article convinced me otherwise https://www.wheresyoured.at/the-subprime-ai-crisis-is-here/
The majority of accounts are free - these are profitable?
IMO they need as many users before their IPO - then the changes will really begin.
source?
After googling https://www.reddit.com/r/singularity/comments/1psesym/openai...
>OpenAI's compute margin, referring to the share of revenue excluding the costs of running its AI models for paying users
Huh?
The reddit summary comment makes no sense. How are they getting revenues without ads or paying customers?
"After" makes more sense.
FTA:
>The company has yet to show a profit and is searching for ways to make money to cover its high computing costs and infrastructure plans.
Fellow annoyed Google AI Pro subscriber here!
Can confirm, I initially enjoyed the 5-hour limits on Gemini CLI and Antigravity so much that I paid for a full year, thinking it was a great decision
In the following months, they significantly cut the 5-hour limits (not sure if it even exists anymore), introduced the unrealistically bad weekly limit that I can fully consume in 1-2 hour, introduced the monthly AI credits system, and added ads to upgrade to Ultra everywhere
At the very least the Gemini mobile app / web app is still kinda useful for project planning and day-to-day use I guess. They also bumped the storage from 2TB to 5TB, but I don't even use that
It should be illegal to change the terms of the subscription mid-period. If you paid for the full year, you should get that plan for the whole year. I don't understand how it's ok for corporations to just change the terms mid-way, and we just have to accept it.
T&C?
Don't bother upgrading to ultra. It's also now easy to burn all your credits where in Jan it was almost impossible
> We may very well look back on the last couple years as the golden era of subsidized GenAI compute.
Looks like enshittification on steroids, honestly.
Getting $5000 worth of product essentially free and then being told to pay is not enshittification.
I'm also hitting the limits in a day when it would last the entire week. The service is literally worth 4x to 6x less. Imagine I go to my favorite restaurant and I pay the same for 1/5th of the food. Bye bye, you have to vote with your wallet.
I noticed the same in last weeks. I canceled my Max 5X and subscribed to Copilot (with Opus 4.6).
It is hard now to hit the limit...
Quite scared by the fact that the original issue pointing out the actual root cause of the issue has been 'Closed as not planned' by Anthropic.
https://github.com/anthropics/claude-code/issues/46829
The response doesn't even make sense and appears to be written by AI.
> The March 6 change makes Claude Code cheaper, not more expensive. 1h TTL for every request could cost more, not less
Feels very AI. > Restore 1h as the default / expose as configurable? 1h everywhere would increase total cost given the request mix, so we're not planning a global toggle.
They won't show a toggle because it will increase costs for some unknown percentage of requests?
Sounds like a decision I would make when memory is expensive and you want to get rid of the very long (in time) tail of waiting 1h to evict cache when a session has stopped.
There must be a better way to do this. The consumer option is the pricing difference. If they’d make cache writes the same price as regular writes, that would solve the whole problem. If you really want to push it, use that pricing only for requests where number of cache hits > 0 (to avoid people setting this flag without intent to use it), and you solved the whole issue.
Memory is expensive? If reads are as rare as they claim you can just stash the KV-cache on spinning disk.
Why scared? Like, if theit software gets bad, we stop using it.
Maybe scared wasn't the best word... but we cannot deny Opus is a great - if not greatest - model at coding and Anthropic is the only one serving it a reasonable prices when going through their subscription model.
Sounds like an addiction to me
When a casino is making a lot of money from gamblers, they don't care about their customers losing money, given the machines are rigged against you.
Anthropic sells you 'knowledge' in the form of 'tokens' and you spend money rolling the dice, spinning the roulette wheels and inserting coins for another try. They later add limits and dumb down the model (which are their gambling machines) of their knowledge for you to pay for the wrong answers.
Once you hit your limit or Anthropic changes the usage limits, they don't care and halt your usage for a while.
If you don't like any of that, just save your money and use local LLMs instead.
I think this comes from Anthropic recently implementing auto routing of model effort. You can manually set effort with /effort in CC.
It does seem like this new routing is worse for the consumer in terms of code quality and token usage somehow.
I don't get it. Last week on the 100 bucks plan I generated probably 50k LOC (not a quality measure for sure!) and just barely kissed the weekly limit. I did get rate limited on some sessions for sure, but that's to be expected.
I'm curious what are people doing that is consuming your limits? I can't imagine filling the $200 a month plan unless I was essentially using Claude code itself as the api to mass process stuff? For basic coding what are people doing?
Yea, I found myself maxing out the $20/mo plan occasionally, so I tried the $100/mo, but I don't think I even once even approached the session limit, let alone the weekly limit. And this is doing what I would consider heavy, continuous programming. I probably ought to go back down to $20 one. It would be nice if they had a cheaper tier in between them, but the tiers they have are probably a good business trick to get people to buy much more than they need.
This is the problem most people are facing. Before March, I had hit the rate limit as single time. That involved security audit of our entire code base from a few different angles.
As of now, I’m consistently hitting my 5 hour limit in less than 1 hour during N/A business hours. I’m getting to the point where I basically can’t use CC for work unless I work very early or late in the day.
I don't hit limits either on $100, it's more that claude-code seems to be constantly broken and they added some vague bullshit about not using claude-code before 2pm so I just don't expect it to work anymore and tend to use codex-cli as my driver nowadays. I also never hit limits in codex but... codex is $20/mo not $100/mo so it's making me consider relocating the $100 I spend to Anthropic as play money for z.ai and other tools. I think claude-code has great training wheels (codex does not) but once the training wheels come off, and claude-code becomes as unreliable as it has been then it makes you consider alternatives.
Anthropic is going through major growing pains, both technical and organizational. The left hand doesn't know what the right hand is doing. It's chaos, things are changing too quickly, and us users are getting caught in the middle of it.
Think Twitter's fail-whale problems. Sometimes you are lucky, sometimes you aren't. Why? We won't know until Anthropic figures it out and from the outside it sure looks like they're struggling.
I have the same experience as you. I’m wondering if it is regional? I’m in Europe so don’t overlap much with US usage, which is likely to be way higher
Also in Europe and can only agree. Granted I'm on the 20x plan, but I have yet to hit a limit once and I'm using Claude 12h+ per day on multiple projects.
What does it look like when you get rate limited? Does the instance just kind of sit and spin?
I suspect I was getting rate limited very aggressively on Thursday last week. It honestly infuriated me, because I'm paying $200 a month for this thing. If it's going to rate limit me, at least tell me what it's doing instead of just making it seem like it's taking 12 hours to run through something that I would expect to be 15 minutes. The worst part is that it never even finished it.
I’ve never been actually rate limited. Usage limits display in yellow when you’re above 90%. At the limit, you’ll get a red error message.
> because I'm paying $200 a month for this thing.
My gut feeling is this is not enough money for them by far (not to mention their investors), and we'll eventually get ratcheted up inline with dev salaries. E.g. "look how many devs you didn't have to hire", etc.
People need to understand a few things: vague questions make the models roam endlessly “exploring” dead ends. “Restarting” old chats immediately eats a lot of context. Anthropic CAN change their limits and rates as they see fit, there’s never been hard promises or SLOs on these plans.
With that said, I pay the Pro subscription (20/mo) and I hit limits maybe 2/3 times over a period of 4 months building a simple running app in Python. I’d not call it production ready but it’s not nothing either.
If people were considerably more willing to aggressively prune their context and scope tasks well, they could get a lot more done with it, at least in my experience. Anthropic can’t really fix anything because the underlying model architecture can’t be “patched”. But I definitely feel a lot of people can’t wrap their heads around the new paradigms needed to effectively prompt these models.
Additionally, opting out is always an option… but these types of issues feel more like laziness than real, structural issues with the model/harness…
This is a copypasta right? I'm damn confident I have read the same content before.
> Anthropic CAN change their limits and rates as they see fit, there’s never been hard promises or SLOs on these plans.
No they can't. When I buy an annual subscription and prepay for the year, they can't just go "ok now you get one token a month" a day in. I bought the plan as I bought it. They can't change anything until the next renewal.
Agreed. I wouldn't have bought an annual subscription under the current conditions.
They rolled out 1M context then they start doing this shit? I know Pro doesn't have access to the 1M context but what a joke.
Been experiencing similar issues even with the lower tier models.
Fair transactions involve fair and transparent measurements of goods exchanged. I'm going to cancel my subscription this month.
Yes: Claude Code “consumes tokens” and starts a session when the computer is asleep without anything started. Or consumes 10% of my session for “What time is it?”
During run the desktop app.
Running non deterministic software for deterministic tasks is still an area for efficiency to improve.
Something similar is happening with GitHub Copilot too. It's impossible to know what a "request" is and some change in the last couple of months has seen my request usage go up for the same style of work. Toss in the bizarre and impossible to understand rate limiting that occurs with regular usage and it's pretty obvious that these companies are struggle to scale.
I'm finding the oppostire with copilot. A request is a prompt, with some caveats around whats generating the prompt. I am quite happily working with opus 4.6 at 3x cost and about 1/3 oor the month in I'm stting at ~25% usage of a pro+ subscription. I find it quite easy to track my usage and rate of usage.
The overall context windows are smaller with copilot I believe, but it dfoesnt appear to be hurting my work.
I'm using it for approx 4 hours a day most days. Generally one shotting fun ideas I thoroughly plan out in planning mode first, and I have my own verison of the idea->plan->analyse-> document implementation phases -> implement via agent loop. simulations, games, stuff-im-curious about and resurrecting old projects that never really got off the ground.
I find copilot to be much more straightforward, and I can track per request against my credits. Here is the explanation of what a request is:
https://docs.github.com/en/copilot/concepts/billing/copilot-...
Switched back to codex for the promotion. Opus at the start of the year was GOAT- just relentless at chewing through hard problems. Now it spins on pretty easy work (took three swings just to edit a ts file) and my session is like 1-3 prompts (downgraded to the $20 plan but still)
Ever since this change they announced:
https://www.reddit.com/r/ClaudeAI/comments/1s4idaq/update_on...
It’s been unusable for me as my daily coding agent. I run out of credits in the pro account in an hour or so. Before that I had never reached the session limit. Switched back to Junie with Gemini/chatgpt.
I pay for the lowest plan. I used to struggle to hit my quota.
Now a single question consistently uses around 15% of my quota
My take is that was the plan all along.
Once people won't be able to think anymore and business expect the level of productivity witnessed before, will have no choice but cough up whatever providers bill us.
>and business expect the level of productivity witnessed before, will have no choice but cough up whatever providers bill us.
Is that bad? After all, even if they hiked to price infinity, you wouldn't worse off than if AI didn't exist because you could still code by hand. Moreover if it's really in a "business" (employment?) context, the tools should be provided by your employer, not least for compliance/security reasons.
Didn't they move too soon then? People haven't forgotten how to tie their shoelaces (yet). And anyway, they'll just move to a different model; last holdout wins.
Too abruptly for sure.
They probably don't have much choice with burn rates and investors, tbh. Market is shaky, etc.
And it’s working larger because the other models haven’t figured out how to provide a consistent, long running experience.
"enshittification" gets thrown around a lot, but this is the exact playbook. Look at the previous bubble's cash cow: advertising.
Online advertising is now ubiquitous, terrible, and mandatory for anyone who wants to do e-commerce. You can't run a mass-market online business without buying Adwords, Instagram Ads, etc.
AI will be ubiquitous, and then it will get worse and more expensive. But we will be unable to return to the prior status quo.
The odds of that happening are high. Trillions invested.
It occurred to me an outright rejections of these tools is brewing but can't quite materialise yet.
What plan are you using?
I'm on the Free tier using Claude exclusively for consultation (send third party codebase + ask why/where is something done). I also used to struggle to hit limits. Recently I was able hit the limit after a single prompt.
I extensively used Claude till now and just tested Genini 3.1 pro yesterday via AI studio. In gemini cli, they don't offer this, i don't know why?
Taking a second opinion has significantly helped me to design the system better, and it helped me to uncover my own and Claude blindspots.
Also, agree that, it spent and waist a lot of token on web search and many a times get stuck in loop.
Going forward- i will always use all 3 of them. Still my main coding agent is Claude for now.. but happy to see this field evolving so fast and it's easy to switch and use others on same project.
No network effects or lock in for a customer. Great to live in this period of time.
In the anticipation of a future where,
a) quotas will get restricted
b) the subscription plan prices will go up
c) all LLMs will become good enough at coding tasks
I just open sourced a coding agent https://github.com/dirac-run/dirac
The entire goal is to be token efficient (over 50% cheaper), and by extension, take advantage of LLM's better reasoning at shorter context lengths
This really started as an internal side project that made me more productive, I hope it will help others too. Apache 2.0
Currently it still can't compete the subsidized coding plan rates using Anthropic API pricing though (even though it beats CC while both use API key), which tells me that all subscription plan operators are losing money on such plans
That's weird, I'm on the $100/mo and I use it for around 2-4hrs a day often with multiple terminal windows and I never even hit 20% of my quota.
It's very easy to calculate the actual cost given they list the exact tokens used. If I take the AWS Bedrock pricing for Opus 4.6 1M context (because Anthropics APIs are subsidized and sold at a loss), here's what each costs:
Cache reads cost $0.31
Cache writes cost $105
Input tokens cost $0.04
Output tokens cost $28.75
The total spent in the session is $134.10, while the Pro Max 5x subscription is $100.
Even taking the Anthropics API pricing, we arrive at $80.58. Below the subscription price, but not by much.
It's just the end of the free tokens, nothing to see here.
I've been feeling the squeeze too. I've tried switching between different models as a test, I can at least say it feels like the limits are about half of what they used to be a few months ago. I'd be totally willing to concede that this is just my perception if Anthropic would only release some tools for measuring your usage.
In theory the /stats command tells you how many tokens you've used, which you could use to compute how much you are getting for your subscription, but in practice it doesn't contain any useful info, it may be counting what is printed to the terminal or something - my stats suggest my claude code usage is a tiny amount of tokens, but they must be an extremely underestimated token count, or they are charging much more for the subscription than the API per token (which is not supposed to be the case).
Last week's free extra usage quota shed some light on this. It seems like the reported tokens are probably are between 1/30th to 1/100th of the actual tokens billed, from looking at how they billed (/stats went up 10k tokens and I was billed $7.10). With the API it should be $25 for a million tokens.
I've got a dual path system to keep costs low and avoid TOS violations.
For general queries and investigation I will use whatever public/free model is available without being logged in. Not having a bunch of prior state stacked up all the time is a feature for me. This is essentially my google replacement.
For very specific technical work against code files, I use prepaid OAI tokens in VS copilot as a "custom" model (it's just gpt5.4).
I burn through maybe $30 worth of tokens per month with this approach. A big advantage of prepaying for the API tokens is that I can look at everything copilot is doing in my usage logs. If I use the precanned coding agent products, the prompts are all hidden in another layer of black box.
I had Max plan and never reached its limit despite constantly working. Now I use the Pro plan and regularly reach the 5h limit as well as the weekly limit, as expected. I found that it makes a huge difference if you provide clear context when developing code. If you leave open room for interpretation, Claude Code uses tokens up much faster than in a defined context. The same is true for his time to answer getting longer if there isn't much documentation about the project.
I'm i alone to think that it become slower that usual to get responses?
Nope. It has become much much slower for me as well. It’s weird cause at times I will get a response very quickly, like it used to be. But most of the time I have to wait quite a bit for the simplest tasks.
For whoever else is having the same problems, worth voting these kind of issues. There needs to be more transparency over what goes on with our subscriptions.
We vote here on HN and it's much more effective. Anyone from Anthropic reading conversations on HN like this one can be scared. We'll jump ship if they don't address such glaring issues.
There are MANY accounts of claude degradation (intelligence, limits) over the past week on reddit and here with many posts describing people moving. Nothing is changing. You'd think they'd at least give a statement.
The nice iOS app is a big convenience for me, but I’m starting to think I should just put my $20 in Open Router. It seems like minimax is a pretty solid competitor. I’m curious if the US-centric “frontier” is just marketing.
imo that’s what I’m doing. Trialing the Hermes harness since I can hook it up to signal. StepFun 3.5 Flash for general assistant stuff and Kimi/Minimax for software development
Unless the agent code is open-sourced, there is hardly any transparency in how the agent is spending your tokens and how does it calculate the tokens. It's like asking your lawyer why they charged some amount.
Lawyers can give you a breakdown by the minute in some cases. A better example can be military contracting.
Yeah perplexity used to be great but they've also clamped down on the 20€ plan. Only one deep research query was enough to block me until the end of the month.
The thing is, if it's going to be this expensive it's not going to be worth it for me. Then I'll rather do it myself. I'm never going to pay for a €100 subscription, that's insane. It's more than my monthly energy bill.
Maybe from a business standpoint it still makes sense because you can use it to make money, but as a consumer no way.
"Hey Claude, can you help me create a strategy to optimize my token use so I don't run into limits so often?" --> worked for me! I had two $200 plans before and now I am cool despite all day use
I had used Claude Code max as my daily driver last year and this sort of drama was par for the course. It's why I migrated entirely to Codex, despite liking Claude, the harness, more.
There's this honeymoon period with Claude you experience for a month or two followed by a trough of disillusionment, and then a rebound after a model update (rinse and repeat). It doesn't help that Anthropic is experiencing a vicious compute famine atm.
I like the term "compute famine" - it appears that all AI infrastructure is maxed out globally.
I've been using Code for half a year, these past couple weeks have been a totally different experience I'm on max 20, and seeing my weekly quota going bust in ~3 days is a bit absurd when nothing has significantly changed in the way I work
This is my exact experience as well.
It’s further frustrating that I have committed to certain project deadlines knowing that I’d be able to complete it in X amount of time with agent tooling. That agentic tooling is no longer viable and I’m scrambling to readjust expectations and how much I can commit to.
I refuse to use anthropic's models (and openai, gemini) because the math simply doesn't add up.
To add the fact we are being taken for fools with dramatic announcements, FOMOs messages. I even suspect some reaction farms are going on to boost post from people boasting Claude models.
These don't happen for codex. Nor for mistral. Nor for deepseek. It can't just be that Claude code is so much better.
There are open weight models that work perfectly fine for most cases, at a fraction of the cost. Why are more people not talking about those. Manipulation.
Mistral isn't that great. Deepseek was good when they first had thinking. But most people just try something out and if that doesn't work on that model then it's bad and for Claude and Codex and Gemini they just are that much better now, but if they quantize or cut limits they destabilize and you're right you might as well just use something worse but reliable.
I regularly compare models. You are right Deepseek was more impressive when the latest came out. But since then they accepted to slow down throughout and keep the same quality.
I often compare with Gemini. Sure those Google servers are super fast. But I can't see it better. Qwen and deepseek simply work better for me.
Haven't tested Mistral in a while, you may be right.
People try out and feel comfortable: using U.S models (I can see the logic), but mostly for brand recognition. Anthropic and OpenAi are the best aren't they? When the models jam they blame themselves.
How good are local LLMs at coding these days? Does anyone have any recommendations for how to get this setup? What would the minimum spend be for usable hardware?
I am getting bored of having to plan my weekends around quota limit reset times...
Get a second hand 3090/4090 or buy a new Intel Arc Pro B70. Use MoE models and offload to RAM for best bang for your buck. For speed try to find a model that fits entirely within VRAM. If you want to use multiple GPUs you might want to switch to vLLM or something else.
You can try any of the following models:
High-end: GLM 5.1, MiniMax 2.7
Medium: Gemma4, Qwen3.5
https://unsloth.ai/docs/models/minimax-m27
https://unsloth.ai/docs/models/glm-5.1
https://unsloth.ai/docs/models/gemma-4
https://unsloth.ai/docs/models/qwen3.5
https://github.com/ggml-org/llama.cpp
Make an AI usage tracker like https://marginlab.ai/trackers/codex/. These hearsay anecdotes prove nothing.
Besides some of the obvious hacks to reduce token usage, properly indexed code bases (think IntelliJ) reduce token usage significantly (30%-50%, while keeping or exceeding result quality compared with baseline) as shown with https://github.com/ory/lumen
Anthropic is not incentivized to reduce token use, only to increase it, which is what we are seeing with Opus 4.6 and now they are putting the screws on
A little off topic, but did Anthropic distill from an older OpenAI model? All the sudden over the last few days I'm getting a ton of em dashes in claude code responses!
My personal experience is way different: I struggle to burn through more than 50% of the 5 hour limit
For context, with Google AI Pro, I can burn through the Antigravity weekly limit in 1-2 hours if I force it to use Gemini 3.1 Pro. Meanwhile Gemini 3 Flash is basically unlimited but frequently produces buggy code or fail to implement things how I personally would (felt like it doesn't "think" like a software dev)
I also tried VS Code + Cline + OpenRouter + MiniMax M2.7. It's quite cheap and seems to be better than Gemini 3 Flash, but it gets really pricy as the context fills up because prompt caching is not supported for MiniMax on OpenRouter. The result itself usually needs 3-6 revisions on average so the context fills up pretty often
Eventually I got Claude Max 5x to try for a month. VS Code + Claude Code extension on a ~15k lines codebase, model set to "Default", and effort set to "Max". So far it's been really good: 0-2 revisions on average, and most of the time it implements things exactly how I would or better. And, like I said, I can only consume 40-60% of the 5-hour limits no matter how hard I try
Granted, I'm not forcing it to use Opus like OP (nor do I use complicated skills or launch multiple tasks at the same time), but I feel like they really nailed the right balance of when to use which model and how to pass context between the them. Or at least enough that I haven't felt the need to force it to use Opus all the time
Reading the other negative comments makes me wonder if this is only because I'm getting a hidden newcomer's limit bonus or something though hahah
They also need to fix the 30 second lag between submitting the request and actually starting to get tokens back - it used to be instant, and still is at work where we use Anthropic models via an enterprise copilot subscription.
It's a bit shocking to me how opaque the pricing for the subscription services by the frontier labs are. It's basically impossible for people to tell what they're actually buying, and difficult to even meaningfully report or compare experiences.
How is this normal?
Yeah, I cancelled the moment I realized that the subscription is a scheme to get you to constantly dip into extra usage. I get more benefit out of Claude on the free tier than on Pro.
I think the right way to think of it, from a self-protective perspective, is this: the real offer is the per-token pricing. Use that for a while, and iff you are consistently spending more than $20/mo, treat the subscription offering as a discount on some of that usage. So only on that condition, try the subscription and see if your monthly costs go down (because of the short term rate limits, they may not, depending on your usage patterns!).
But the opacity itself is a bit offensive to me. It feels shady somehow.
Vote with wallet. The voting continues until product improve or die.
Codex is the only CLI I've had purely positive experiences with. Take that for what you will
Codex is my preferred, I use it at work. The whole "Department of War" fiasco was enough for me to say Goodbye to OAI for personal. I'm a Claude person now. It's about the same level of performance really.
We also experienced hitting our Claude limits much earlier than before during the last two weeks. Up to a degree where we were thinking it must be a bug.
Wasn't Antrophic previously offering double the token usage outside busy hours? Now they are counting tokens back at normal rate. But yeah, it's not good. I use codex because claude insists in peaking at and messing with folders and file outside its work area though
So glad I just pay by the token.
And in classic Anthropic fashion at this point, their issues appear to just be for show. No one triages them, no one responds to them.
I guess it’s better to step down now that we can rather than wait until it becomes impossible (Stockholm syndrome)
No FOMO
this same pattern seems to occur every time a new model is about to release. i didnt notice the usage problem - i am on 20x. but opus 4.6 feels siginificantly dumber for some reason. i cant qualitify it, but it failed on everyday tasks where it used to complete perfectly
Every time there is a new model coming I think they deteriorate the current. This happens every darn time. Opus 4.6 isn't as sharp, not even close to as it was few weeks ago.
I’m processing some images(custom board game images -> JSON) with a common layout and basic structure and I exhausted my quota after just 30 images(pleb Pro account). I have 700 images to process…
What I did instead is tune the prompt for gemma 4 26b and a 3090. Worked like a charm. Sometimes you have to run the main prompt and then a refinement prompt or split the processing into cases but it’s doable.
Now I’m waiting for anyone to put up some competition against NVIDIA so I can finally be able to afford a workstation GPU for a price less than a new kidney.
Lol imagine how much overcharging is going on for enterprise tokens. This is just the beginning.
I feel like I am living in a bubble, no one seems to mention Antigravity in these discussions and I have not had any issues with Ultra subscription yet. It seems to go on forever and the Interface is so much better for dev work as compared to CC. (Though admittedly my experience with cc is limited).
I strongly believe google's legs will allow it to sustain this influx of compute and still not do the rug-pull like OAI or Anthropic will be forced to do as more people come onboard the code-gen use case.
You know Emacs still works.
I don't use Claude so this doesn't affect me, but I worry it will spoil the fun for me for following reason.
They inflated how much their tools burn tokens from day one pretty much,remember all the stupid research and reports Claude always wanted to do, no matter what you asked it. Other tools are much smarter so this is not such a big deal.
More importantly, these moves tend to reverberate in the industry, so I expect others will clamp down on usage a lot and this will spoil my joy of using AI without countring every token.
Burning tokens doesn't just wastes your allotment, it also wastes your time. This gave rise to turbo offering where you get responses faster but burn 2x your tokens.
I've seen ridiculously fast quota usage on antigravity too, where sometimes lots of work is possible, then it all goes literally within 4 questions.
Probably a combination of it being vibe coded shit and something in the backend I expect.
I mean this is expected is it not? These companies burned unimaginable amounts of investor cash to get set up and now they have to start turning a profit. They can't make up for the difference with volume because the costs are high, so the only option is to raise prices.
It's crazy, a few weeks ago the limits would comfortably last me all week. This week, I've used up half the limit in a day.
GPT-5.4 works amazingly well.
I’ve moved away from Claude and toward open-source models plus a ChatGPT subscription.
That setup has worked really well for me: the subscription is generous, the API is flexible, and it fits nicely into my workflow. GPT-5.4 + Swival (https://swival.dev) are now my daily drivers.
Chatgpt has better limits however the responses even on 5.4 xtra thinking are not as good as sonnets. Wish Claude would get their house in order.
Show us some reciepts in the form of a exported session. I've been a heavy user of Claude up untill the end of feb, but switched to Codex because it's better at handling large code bases, following the "plan", implementing the backend changes in Zig. If you ask Claude to do a review of the code and suggest fixes, then let it Codex review it, then again ask Claude, it will 99% of the time say. Oh yes you are right, let me fix that.
Either you are using it wrong or you are working in a totally different field.
Yeah it's much better, another plus is you can use it with OpenCode (or other 3rd party tools) so you can easily switch between Codex and most other models by alright companies (not Anthropic or Google).
The two comments together sound like 2000s infomercial.
I hit the limits on the lower tiers of Codex just as fast as with Claude. At the moment I'm cycling between Claude, Codex, GLM5.1, and Kimi. The latter two are getting good enough, though, that I can make things go really far by doing planning with Opus and then switching to one of the cheap models for execution.
I have a ChatGPT Pro plan, I use it a ton, and I've never hit the limit in the past few months.
That’s why I switched to Codex. It’s so much more generous and in my experience, just as good. Also, optimizing your setup for working with agents can easily make a 5x difference.
Isn't the generous Codex plan ending? Possibly yesterday?
> As the Codex promotion on Plus winds down today
> Also, optimizing your setup for working with agents can easily make a 5x difference.
Any highlights you can share here? I'm always looking to improve me setup.
Plus, whenever Codex does something you dislike, you can just tell Codex to fix itself. Open source software is wonderful.
Especially when it's on purpose.
I don't understand Anthropic. Be consistent. Why do models deteriorate to shit, this is not good for workflows and or trust. What Opus 4.7 is gonna come out and again the same thing? Come on.
thats exaclty why i prefer codex
Also pro max 5x and hit quota for first time yesterday.
I spend full 20x the week quota in less than 10 hours. How is that possible? Well try to mass translate texts in 30 languages and you will hit limits extremely quick.
Translation generally works well with very small models compared to the frontier LLMs. You can definitely run a model on your own hardware for this.
That's a really gnarly task but I'm shocked it burns 20x that fast. How large is the text? That matters more than anything.
I imagine book sized texts.
Constant complaints about Anthropic. Not much on OAI/Codex. It seems people should just use OAI and come back when they realize compute isn’t free elsewhere.
Bubble's bursting, get in.
I have the opposite conclusion.
Demand is higher than supply it is just the start of bubble.
Everyone and their dog is burning tokens on stupid shit that would be freed up if they would ask to make deterministic code for the task and run the task. OpenAI, Anthropic are cutting free use and decreasing limits because they are not able to meet the demand.
When general public catches up with how to really use it and demand will fall and the today built supply will become oversupply that’s where the bubble will burst.
I say 5 more years.
This is your regular friendly reminder that these subscriptions do not entitle you to any specific amount of usage. That "5x" is utterly meaningless because you don't know what it's 5x of.
This is by design, of course. Anyone who has been paying even the slightest bit of attention knows these subscriptions are not sustainable, and the prices will have to go up over time. Quietly reducing the usage limits that they were never specific about in the first place is much easier than raising the prices of the individual subscription tiers, with the same effect.
If you want to know what kind of prices you'll be paying to fuel your vibe coding addiction in a few years, try out API pricing for a bit, and try not to cry when your 100$ credit is gone in 2 days.
so basically the anthropic employee who responded says those 1h caches were writes were almost never accessed, so a silent 5m cache change is for our best interest and saves cost. (justifying why they did this silently)
however his response gaslights us because in the OPs opening post his math demonstrates this is not true, it shows reads 26x more so at least in his case the cache is not doing what the anthropic employee describes.
clearly we are being charged for less optimization here and being given the message (from my perspective by anthropic) that if you are in a special situation your needs don't matter and we will close your thread without really listening.
What also gives it away is the refusal to at least expose this TTL via parameter. In the same sentence as informing the 5m won't change since it's your interest.
It's also in the interest of the users to keep certain params private, we are meant to deduce that. Did you not ?
> that if you are in a special situation your needs don't matter and we will close your thread without really listening.
Are there any other $50B+ Valuation companies that care about special situations? If so, who?
Why so many 'developers' complaining about Claude rate limiting them? You know you can actually....use local LLMs? instead of donating your money to Anthropic's casino?
I guess this is fitting when the person who submitted the issue is in "AI | Crypto".
Well there's no crying at the casino when, you exhaust your usage or token limit.
The house (Anthropic) always wins.
Local LLMs are nowhere near as powerful as commercial models. Plus, they have hefty hardware requirements.
Some months ago, I created a software for this reason, it has no success, but the thing is that communities could reduce tokens consumption, not all is LLM, you can share things from API calls between agents. Even my idea was no success I think it is a good concept share things each others, if you have some interest it's called tokenstree.com