> the small model category as a whole is seeing its share of usage decline.
It's important to remember that this data is from OpenRouter... a API service. Small models are exactly those that can be self-hosted.
It could be the case that total small model usage has actually grown, but people are self-hosting rather than using an API. OpenRouter would not be in a position to determine this.
Thank you & totally agree! The findings are purely observational through OpenRouter’s lens, so they naturally reflect usage on the platform, not the entire ecosystem.
Openrouter has an apps tab. If you look at the free, non-coding models, some apps that feature are: janitor.ai, sillytavern, chub.ai. I'd never heard of them but people seem to be burning millions of tokens enjoying them.
If you rely on AI to write most of your code (instead of using it like Stackoverflow), Claude Code/OpenAI Codex subscription are cheaper than buying tokens. So those users are not on openrouter.
The open weight model data is very interesting. I missed the release of Minimax M2. The benchmarks seem insanely impressive for its size. I would suspect benchmaxing but why would people be using it if it wasn’t useful?
> The metric reflects the proportion of all tokens served by reasoning models, not the share of "reasoning tokens" within model outputs.
I'd be interested in a clarification on the reasoning vs non-reasoning metric.
Does this mean the reasoning total is (input + reasoning + output) tokens? Or is it just (input + output).
Obviously the reasoning tokens would add a ton to the overall count. So it would be interesting to see it on an apples to apples comparison with non reasoning models.
As would models that that are overly verbose. My experience is the Claude tends to do more than is asked for (e.g. immediately move on to creating tests and documentation) while other models like Gemini tend to be more concise in what they do.
I'm out of time but "reasoning input tokens" from fortune 5000 engineers sounds like a lobotomized LSD dream, would you care on elaborating how you distinguish between reasoning and non-reasoning? vs "question on duty"?
I believe they’re just classifying all models into “reasoning models” eg o3 vs “non reasoning models” eg 4o and just doing a comparison of total tokens (input tokens + hidden reasoning output tokens + shown output tokens)
It's a 1.7 trillion token free model. Why wouldn't you try it?
I've been testing free models for coding hobby projects after I burnt through way too many expensive tokens on Replit and Claude. Grok wasn't great, kept getting into loops for me. I had better results using KAT coder on opencode (also free).
Super interesting data.
I do question this finding:
> the small model category as a whole is seeing its share of usage decline.
It's important to remember that this data is from OpenRouter... a API service. Small models are exactly those that can be self-hosted.
It could be the case that total small model usage has actually grown, but people are self-hosting rather than using an API. OpenRouter would not be in a position to determine this.
Thank you & totally agree! The findings are purely observational through OpenRouter’s lens, so they naturally reflect usage on the platform, not the entire ecosystem.
According to the report, 52% of all open-source AI is used for *roleplaying*. They attribute it to fewer content filters and higher creativity.
I'm pretty surprised by that, but I guess that also selects for people who would use openrouter
Openrouter has an apps tab. If you look at the free, non-coding models, some apps that feature are: janitor.ai, sillytavern, chub.ai. I'd never heard of them but people seem to be burning millions of tokens enjoying them.
If you rely on AI to write most of your code (instead of using it like Stackoverflow), Claude Code/OpenAI Codex subscription are cheaper than buying tokens. So those users are not on openrouter.
I'm curious what percentage of claude/codex users this is true for - I assumed their business models rely on this not being true for the majority.
The open weight model data is very interesting. I missed the release of Minimax M2. The benchmarks seem insanely impressive for its size. I would suspect benchmaxing but why would people be using it if it wasn’t useful?
> The metric reflects the proportion of all tokens served by reasoning models, not the share of "reasoning tokens" within model outputs.
I'd be interested in a clarification on the reasoning vs non-reasoning metric.
Does this mean the reasoning total is (input + reasoning + output) tokens? Or is it just (input + output).
Obviously the reasoning tokens would add a ton to the overall count. So it would be interesting to see it on an apples to apples comparison with non reasoning models.
As would models that that are overly verbose. My experience is the Claude tends to do more than is asked for (e.g. immediately move on to creating tests and documentation) while other models like Gemini tend to be more concise in what they do.
I'm out of time but "reasoning input tokens" from fortune 5000 engineers sounds like a lobotomized LSD dream, would you care on elaborating how you distinguish between reasoning and non-reasoning? vs "question on duty"?
"reasoning" models like GPT 5 et al do a pre-generation step where they:
- Take in the user query (input tokens)
- Break that into a game plan. Ex: "Based on user query: {query} generate a plan of action." (reasoning tokens)
- Answer (output tokens)
Because the reasoning step runs in a loop until it's run through it's action plan, it frequently uses way more tokens than the input/output step.
I believe they’re just classifying all models into “reasoning models” eg o3 vs “non reasoning models” eg 4o and just doing a comparison of total tokens (input tokens + hidden reasoning output tokens + shown output tokens)
that's exactly right!
Who is using grok code and why?
According to https://openrouter.ai/rankings, lots of people are using it - presumably because it performs well and provides value.
It's a 1.7 trillion token free model. Why wouldn't you try it?
I've been testing free models for coding hobby projects after I burnt through way too many expensive tokens on Replit and Claude. Grok wasn't great, kept getting into loops for me. I had better results using KAT coder on opencode (also free).
This is really amazing data. Super interesting read