State of AI: An Empirical 100T Token Study with OpenRouter

(openrouter.ai)

74 points | by anjneymidha 2 hours ago ago

17 comments

lukev an hour ago

Super interesting data.

I do question this finding:

> the small model category as a whole is seeing its share of usage decline.

It's important to remember that this data is from OpenRouter... a API service. Small models are exactly those that can be self-hosted.

It could be the case that total small model usage has actually grown, but people are self-hosting rather than using an API. OpenRouter would not be in a position to determine this.

[-]

maikakz 35 minutes ago

Thank you & totally agree! The findings are purely observational through OpenRouter’s lens, so they naturally reflect usage on the platform, not the entire ecosystem.

syspec an hour ago

According to the report, 52% of all open-source AI is used for *roleplaying*. They attribute it to fewer content filters and higher creativity.

I'm pretty surprised by that, but I guess that also selects for people who would use openrouter

[-]

djfergus 30 minutes ago

Openrouter has an apps tab. If you look at the free, non-coding models, some apps that feature are: janitor.ai, sillytavern, chub.ai. I'd never heard of them but people seem to be burning millions of tokens enjoying them.

raincole 31 minutes ago

If you rely on AI to write most of your code (instead of using it like Stackoverflow), Claude Code/OpenAI Codex subscription are cheaper than buying tokens. So those users are not on openrouter.

[-]

djfergus 18 minutes ago

I'm curious what percentage of claude/codex users this is true for - I assumed their business models rely on this not being true for the majority.

sosodev 27 minutes ago

The open weight model data is very interesting. I missed the release of Minimax M2. The benchmarks seem insanely impressive for its size. I would suspect benchmaxing but why would people be using it if it wasn’t useful?

themanmaran 2 hours ago

> The metric reflects the proportion of all tokens served by reasoning models, not the share of "reasoning tokens" within model outputs.

I'd be interested in a clarification on the reasoning vs non-reasoning metric.

Does this mean the reasoning total is (input + reasoning + output) tokens? Or is it just (input + output).

Obviously the reasoning tokens would add a ton to the overall count. So it would be interesting to see it on an apples to apples comparison with non reasoning models.

[-]

ribosometronome 28 minutes ago

As would models that that are overly verbose. My experience is the Claude tends to do more than is asked for (e.g. immediately move on to creating tests and documentation) while other models like Gemini tend to be more concise in what they do.

reeeli an hour ago

I'm out of time but "reasoning input tokens" from fortune 5000 engineers sounds like a lobotomized LSD dream, would you care on elaborating how you distinguish between reasoning and non-reasoning? vs "question on duty"?

[-]

themanmaran an hour ago

"reasoning" models like GPT 5 et al do a pre-generation step where they:

- Take in the user query (input tokens)

- Break that into a game plan. Ex: "Based on user query: {query} generate a plan of action." (reasoning tokens)

- Answer (output tokens)

Because the reasoning step runs in a loop until it's run through it's action plan, it frequently uses way more tokens than the input/output step.

typs an hour ago

I believe they’re just classifying all models into “reasoning models” eg o3 vs “non reasoning models” eg 4o and just doing a comparison of total tokens (input tokens + hidden reasoning output tokens + shown output tokens)

[-]

maikakz an hour ago

that's exactly right!

asadm 35 minutes ago

Who is using grok code and why?

[-]

joshuamcginnis 4 minutes ago

According to https://openrouter.ai/rankings, lots of people are using it - presumably because it performs well and provides value.

djfergus 25 minutes ago

It's a 1.7 trillion token free model. Why wouldn't you try it?

I've been testing free models for coding hobby projects after I burnt through way too many expensive tokens on Replit and Claude. Grok wasn't great, kept getting into loops for me. I had better results using KAT coder on opencode (also free).

typs an hour ago

This is really amazing data. Super interesting read