Open Source as it gets in this space, top notch developer documentation, and prices insanely low, while delivering frontier model capabilities. So basically, this is from hackers to hackers. Loving it!
Also, note that there's zero CUDA dependency. It runs entirely on Huawei chips. In other words, Chinese ecosystem has delivered a complete AI stack. Like it or not, that's a big news. But what's there not to like when monopolies break down?
The incredible arrogance and hybris of the American initiated tech war - it is just a beautiful thing to see it slowly fall apart.
The US-China contest aside - it is in the application layer llms will show their value. There the field, with llm commoditization and no clear monopolies, is wide open.
There was a point in time where it looked like llms would the domain of a single well guarded monopoly - that would have been a very dark world. Luckily we are not there now and there is plenty of grounds for optimism.
Still not sure how I feel about China of all places to control the only alternative AI stack, but I guess it's better than leaving everything to the US alone. If China ever feels emboldened enough to go for Taiwan and the US descends into complete chaos, the rest of the world running on AI will be at the mercy of authoritarian regimes. At the very least you can be sure noone is in this for the good of the people anymore. This is about who will dominate the world of tomorrow. And China has officially thrown their hat in the ring.
I always find it an illuminating experience about the power of mass propaganda every time I see an American believe they somewhat have the moral high ground over China, despite starting a new war somewhere around the globe either for petrol or on behalf of Israel every six months.
Isn't the US building mass detention camps right now for all the brown people there? And arresting / detaining / demanding papers from any and everyone? With federal agents killing civilians?
Don't get me wrong, China is also horrible here, they have their own camps.
But pretending the US is positive wrt human rights is a wild take in 2026.
> Just because America is doing bad things doesn't mean China is good, or vice versa.
Of course not. When it comes to SOTA LLMs you have the choice between two bad options. For many, choosing the Chinese option is just choosing the lesser of two evils (and it's much cheaper).
For now indeed, the people that want to get rid of it are currently in power.
The US was one of the first democracies in the world, and many countries followed suit. But the US hasn't kept up, and now the powers that be have exploited the weaknesses in the system. With arguably the biggest one being giving the president too much power (appointing supreme court justices, executive orders, etc).
Democracy in most of the countries is just theater. Trump promised no more wars iirc.
Don't get me wrong, I'd rather live in a country without a million cameras that automatically fine me for crossing the street illegally but I don't actually deceive myself in thinking my vote counts for much.
Moral stances aside, I'd argue it's healthy that the US gets competition from abroad. I appreciate the boost that the world is getting from China - infrastructure and construction projects are a huge benefit to economies. Their focus on green energy has caused a huge influx of affordable solar panels, home batteries, EVs, etcetera, helping reduce the dependency on fossil fuels - while the US and especially the other big money spenders in the middle east would rather the world remain fully dependent on them. But for the past years Europe and now Asia are feeling the pain from being overly reliant on that.
China's policies and government aren't morally defensible and I do fear that they will become more aggressive in spreading their influence and policies onto other countries, but from an economic standpoint what they're doing is super effective. While the previous world power (the US) is stuck in infighting and going through cycles of fixing/undoing the previous administration's damages, instead of planning ahead.
Competition with the Soviet Union gave all the workers in the world better conditions, also advances in science and technology... (And risk of mutual destruction ;)), even if the USSR wasn't good.
AFAIK: Current Mistral models are not competitive with SOTA-models that come out of the USA or China. They are "good enough" for enterprise usage when you don't need SOTA performance.
Their main selling point is: They are neither US-American nor Chinese. That's a real moat in today's world. I think at the moment they feel quite comfortable.
I don’t know if we’re ahead of the curve but that tired feeling has started turning into hate here in the EU. I guess being threatened with invasion does that to you.
The next decade is going to look very different with America Alone.
I grew up in the states when I was younger, always feeling some closeness to Americans even after I moved back to Europe.
With all that goes on it has changed. Recently I sat on a plane near some Americans discussing their holidays here, and I noticed I felt contempt. Sitting their with insane privilege as their government torches the world.
Individuals remain individuals, and one really ought not to be prejudice. However the lack of resistance I see in in the “land of the free” as their “democratic” institutions collapse just makes me believe they never cared at all. In France cars are torched if the pension age is raised. In America the rise facism apparently doesnt matter to them.
I a European who spent the last decade in America and I'm not sure I'd call Americans privileged compared to Europe. With money being the one means you have to be treated well in society, comparing it to Europe, America feels like the hunger games. Want healthcare (ie surviving)? Healthy food? To own your house? Welcome to the games
From my small bubble it's not that. I'm Dutch, married to an American who now knows enough Dutch such that we can treat it as a secret language when we're in the US.
My family in law seems to swing slightly republican. As a Dutchie, I could get some answers because I'm too naive not to talk about politics. So I got to probe a bit. What I simply found was that they'd say "I can't trust the news, none of it. Not CNN, not Fox News, nothing". Then I'd say "well in the Netherlands, I'd argue that while news outlets have their bias, you can trust them on basic factual reporting". She looked at me with a stare that I could only describe as "oh but honey, you're too young and naive to understand". To which I thought "you don't know the Netherlands. We're not perfect but we're nowhere near as deranged as what I'm seeing here".
I think that explains a lot of it for some people. The trust in the media, all media, is completely broken. Trump has how many fellonies now? Can't trust it. Kamala is doing what now? All talk. DOGE is fixing the government? I fucking hope so! But can't trust the damn news. Whether they do or don't, they are always burning money, god damn bureaucrats.
I feel that's the mindset that my family in law has.
My running hypothesis has been the trust breakdown arises from social-media overexposure driving lazy nihilism, which in turn gave free reign to a uniquely-corrupt class of politicians. But I'm not sure how to neutrally evaluate that.
not all of us are just "sitting here with insane privilege." it's quite dangerous for some of us right now.
I'm trans. this Administration does not like us. after Charlie Kirk's murder, things got legitimately scary. Musk was retweeting people who called us "deranged bioweapons" who needed to be "forcibly institutionalized." NSPM-7 is surveilling and infiltrating trans organizations. the Heritage Foundation proposed labeling us as "ideological extremists," in the same category as neo-Nazis. if I'm arrested, I'll go to a men's prison where I'll likely be given to a violent inmate as his cellmate to "pacify" him (V-coding.)
so yeah, I keep my head down. a lot of Jews kept their heads down in Germany in the '30s, you know? and just like then, it doesn't seem like other countries are too keen on taking us in as refugees. I hope that changes if things get bleak.
It’s not that it doesn’t matter to Americans. It is worse; half the population (or at least, half the voting population), is thrilled with the development of fascism. The other half has been ringing the alarm bells for well over a decade; it seems to make no difference.
And you’re right, most Americans do not understand the privileges they have or give one single shit about democracy; it is just not a salient political issue. But eggs… don’t get me started on eggs.
It's probably a bit more nuanced than "half this, half that"; when you look at the facts, most voters aren't that extremist. A lot of votes vote one way or the other because they would simply never vote for the other.
This is why the swing voters / swing states are so important in the US, because only a few million are flexible enough to switch sides.
Of course the core issue is that there's a two party system; while I'm sure that in a healthy democracy the current republican and democrat parties would be the bigger ones, they wouldn't have a majority.
> In France cars are torched if the pension age is raised.
This is not something to be proud of. You guys are giving yourself loaned freebies, retiring 5+ (!) years earlier than countries like BeNeLux and Germany, and are pretty much expecting the EU to eventually pick up the pieces which will drag us all down.
Other countries don't directly pay for the pensions, but France is staring into a giant fiscal abyss because of their low retirement age (and other generous social benefits). Any attempt to change those results in the country being taken hostage by rioters, thus nothing changes.
At some point France will be in too deep shit in look to the EU to cover for them. We will all pay for that. And it is deeply unfair because other countries their citizens have accepted later retirement and more frugal benefits to keep their country fiscally healthy.
France could cover the fiscal hole in other ways, but taxing corporations and wealth higher also consistently ends up being blocked. And each year the hole gets deeper.
The reason you see the distancing from America in European media is the Republican government. Democrats were not better to Europe than Republicans, but you have to understand that current European leaders are vassals to the US Democrats and the Epstein-List".
In the near future, one of two things will happen:
- Democrats will regain power, and the EU will be happy with the USA again.
- EU regime will swear an oath to the Republicans and continue to be vassals to them and the Epstein-List.
Whatever happens, one thing is sure: Epstein-List will stay in control.
america is a continent. let’s take back our vocabulary (fellow european here).
the little orange man shows very well what i mean when he started giving names to the gulf of mexico.
People in China live under totalitarian rule, that much is true.
But how free is the average North American, where getting sick can bring you and your family financial ruin? Where the "free press" is controlled by corporations who are also the main source of campaign funding for politicians? Where their urban spaces are designed to require you to have a car and promote complete atomized individuals?
..you forgot to mention that any technology in China, foreign or domestic, can and will be used for and to the benefit of the -military- party.. But like someone posted: "not perfect" fits the bill.
Check out the Sean Ryan Show with Palmer Luckey on China and military tech.
tiananmen square was in 1989. Hong Kong was snuffed out like a light. Covid saw people caged and sealed in their houses. You do not need to look back at the cultural revolution to see the prc for what it is.
Is your contention that Hong Kong is also a totalitarian society? Have you been to Hong Kong in the last 5 years? I feel like people saying these sorts of things are just completely divorced from reality.
> Covid saw people caged and sealed in their houses.
No. There were a few incidents very early on, when everyone was (quite understandably) panicking about a new, deadly virus that nobody had ever seen before, when some local city officials barred the doors of people who had just come from Wuhan. That was a scandal inside China, and it was immediately reversed.
What China did do quite extensively was border quarantine, and during localized outbreaks (caused by cases that slipped through quarantine at the border), mass testing and quarantine measures. This was during a once-in-a-generation pandemic that killed millions of people. In China, these measures saved several million lives. The estimates are that China's overall death rate was about 25% that of the US, and these measures are the reason. By the way, Taiwan and Australia took nearly identical measures, and I very much doubt that you would call them totalitarian societies.
As a different Brit I do not accept such moral relativism.
China’s governments actions are on a completely different level - for example:
“””
Since 2014, the government of the People's Republic of China has committed a series of ongoing human rights abuses against Uyghurs and other Turkic Muslim minorities in Xinjiang which has often been characterized as persecution or as genocide.
Why do we ignore all the human right abuses the US perform abroad? Iraq, Afghanistan, now Iran, Gaza and Lebanon through Israel, support to Saudi Arabia (which would not exist without the US), El Salvador... And inside it's also horrible with its treatment to immigrant.
That should be at least comparable (if not worse) than what China is doing.
It's 2026 and people still believe this Uyghur genocide propaganda? In the meantime, Israel and the US have been killing people in the middle east for years, but china is "on a completely different level"?
This is such a tired argument, and morally repugnant. Where is the UK in the race, where is the EU? Lets get of our asses and stop moralizing.
(China wiped out the entire EU industry through a "quiet" trade war since like the last 15 years, and we're not really talking about that aren't we...)
Not so much a trade war as basic economic forces, and it's been going on for much longer than that. When infrastructure improves, companies and customers can look further to get their stuff done. If it's cheaper to do your industrial or manufacturing work abroad and have it transported to your country, that just happens.
The powers that be try to slow this down by banning imports outright (you can't for example import American chicken into Europe because of food safety laws), or high import taxes (Chinese EVs have a 50% import tax in Europe and the US to protect the local car manufacturers. Which is fair because the Chinese EV manufacturers are state-sponsored so their prices are unfair. Then again, western companies get billions in investor money to push the prices down).
You mean the west handed their industry to china over the last 15 years? Its not like the US is any better off in this. The EU is not a country, so you can't talk about it as if it was. Each country has their own companies and industries. There is AI in Europe, and its growing, however we might not be as "energetic" about destroying our countries to build giant data centers to serve our billionaire overlords. That does not mean that there is no investment, there is, including a bunch of American corporations like Amazon. But there is also a lot of corruption and bribing (lobbying - lets call it what it really is, no more whitewashing) going on around that too.
So again, stop referring to EU as a country, we are not, and it just annoys any Europeans as it comes of as "Americans who don't understand the world outside of the USA".
They may say that, but we can't assume that. Given it's Chine, we actually have to assume it's mostly false. And since there'll never be a proper audit by an independent party of all of their data centers, we'll never know.
Let's see how long it takes before the big US AI companies start lobbying to outright ban use of Chinese AI, even the open source / local models. For "national security" reasons, of course.
For me open source means that the entire training data is open sourced as well as the code used for training it otherwise it's open weight. You can run it where you like but it's a black box. Nomic's models are good example of opensource.
To be fair I prefer the Chinese models censorship (yes, seriously) because if you ask certain topics they just don't answer instead of giving skewed answers.
Just ask it for a summary of the USA’s role in Iran, Gaza, Lebanon and its recent threats against Panama, Cuba and Greenland! It might be able to keep track.
Ask Gemini today if the United States is trying to destroy the nation of Iran, and it will feed you the (white-washed) party line, straight from the White House, with a bit of 'some people disagree' thrown in. No mention of America's threats of "Complete annihilation", "Killing a civlization", and all the rest.
> Summary: The U.S. is currently engaged in an active war aimed at dismantling the Iranian government and its military capabilities, but it distinguishes this from destroying the country or its people. However, the humanitarian impact—including civilian casualties from airstrikes and the domestic crackdown by Iranian security forces—has led many international observers to warn that the campaign risks long-term instability and "state collapse" rather than a simple transition of power.
It does do quite a bit better if you ask it about the genocide in Gaza, summarizing the case for it, and citing only token justifications from the guilty party.
As of April 2026, Gemini is... For very obvious reasons, highly biased towards cultural consensus. If your cultural consensus is strong on some really messed up things, that's the outcome that it's going to give you.
> Isn't there a difference between the models output reflecting the mean of public discourse and the active adjustment of information by the government?
Not as much a difference as you would wish, as mean of public discourse is very actively managed, to our collective detriment, by a very small group of powerful people, which often includes the government. It's the nature of mass media.
They Thought They Were Free, and all that. By the time the 'mean of public discourse' centers on something incredibly stupid or awful, nobody can be arsed to figure out who planted that idea in our heads.
Theoretically yes. It is entirely possible to poison the training data for a supply chain attack against vibe coders. The trick would be to make it extremely specific for a high value target so it is not picked up by a wide range of people. You could also target a specific open source project that is used by another widely used product.
However there is so many factors involved beyond your control that it would not be a viable option compared to other possible security attacks.
I don't mean that flippantly. These things are dumped in the wild, used on common (largely) open source execution chains. If you find a software exploit, it's going to affect your population too.
Wet exploits are a bit harder to track. I'd assume there are plenty of biases based on training material but who knows if these models have a MKUltra training programme integrated into them?
From my experience, kinda the opposite? It's like Chinese software is... Harder to weaponize or hurt yourself on. Deepseek is definitely censored, but I've never caught it being dishonest in a sneaky way.
In their paper, point 5.2.5 talks about their sandboxing platform(DeepSeek Elastic Compute). It seems like they have 4 different execution methods: function calls, container, microVM and fullVM.
This is a pretty interesting thing they've built in my opinion, and not something I'd expect to be buried in the model paper like this. Does anyone have any details about it? Google doesn't seem to find anything of note, and I'd love to dive a bit deeper into DSec.
There are quite a few comments here about benchmark and coding performance. I would like to offer some opinions regarding its capacity for mathematics problems in an active research setting.
I have a collection of novel probability and statistics problems at the masters and PhD level with varying degrees of feasibility. My test suite involves running these problems through first (often with about 2-6 papers for context) and then requesting a rigorous proof as followup. Since the problems are pretty tough, there is no quantitative measure of performance here, I'm just judging based on how useful the output is toward outlining a solution that would hopefully become publishable.
Just prior to this model, Gemini led the pack, with GPT-5 as a close second. No other model came anywhere near these two (no, not even Claude). Gemini would sometimes have incredible insight for some of the harder problems (insightful guesses on relevant procedures are often most useful in research), but both of them tend to struggle with outlining a concrete proof in a single followup prompt. This DeepSeek V4 Pro with max thinking does remarkably well here. I'm not seeing the same level of insights in the first response as Gemini (closer to GPT-5), but it often gets much better in the followup, and the proofs can be _very_ impressive; nearly complete in several cases.
Given that both Gemini and DeepSeek also seem to lead on token performance, I'm guessing that might play a role in their capacity for these types of problems. It's probably more a matter of just how far they can get in a sensible computational budget.
Despite what the benchmarks seem to show, this feels like a huge step up for open-weight models. Bravo to the DeepSeek team!
Wondering how gpt 5.5 is doing in your test. Happy to hear that DeepSeek has good performance in your test, because my experience seems to correlate with yours, for the coding problems I am working on. Claude doesn't seem to be so good if you stray away from writing http handlers (the modern web app stack in its various incarnations).
I don't want to give away too much due to anonymity reasons, but the problems are generally in the following areas (in order from hardest to easiest):
- One problem on using quantum mechanics and C*-algebra techniques for non-Markovian stochastic processes. The interchange between the physics and probability languages often trips the models up, so pretty much everything tends to fail here.
- Three problems in random matrix theory and free probability; these require strong combinatorial skills and a good understanding of novel definitions, requiring multiple papers for context.
- One problem in saddle-point approximation; I've just recently put together a manuscript for this one with a masters student, so it isn't trivial either, but does not require as much insight.
- One problem pertaining to bounds on integral probability metrics for time-series modelling.
It would be wonderful to have a deeper insight, but I understand that you can disclose your identity (I understand that you work in applied research field, right ? )
First you clone the API of the winner, because you want to siphon users from its install-base and offer de-risked switch over cost.
Now that you’re winning, others start cloning your API to siphon your users.
Now that you’re losing, you start cloning the current winner, who is probably a clone of your clone.
Highly competitive markets tend to normalize, because lock-in is a cost you can’t charge and remain competitive. The customer holds power here, not the supplier.
Thats also why everyone is trying to build into the less competitive spaces, where they could potentially moat. Tooling, certs, specialized training data, etc
Our (western) economic model forces competing individual companies to be profitable quickly. China can ignore DeepSeek losing money, because they know developing DeepSeek will help China. Not every institution needs to be profitable.
yes, they want to win the same way they won more or less every other economic competition in the last 30 years, scale out, drop prices and asphyxiate the competition.
Yeah, it’s an interesting one. I think inertia and expectations at this point? I don’t think the big labs anticipated how low the model switching costs would be and how quickly their leads would be eroded (by each other and the upstarts)
They are developing their moats with the platform tooling around it right now though. Look at Anthropic with Routines and OpenAI with Agents. Drop that capability in to a business with loose controls and suddenly you have a very sticky product with high switching costs. Meanwhile if you stick with purely the ‘chat’ use cases, even Cowork and scheduled tasks, you maintain portability.
No, they are not. If they were "racing to AGI" they would be working together. OpenAI would still be focused on being a non-profit. Anthropic wouldn't be blocking distillation on their models.
If by AGI you mean IPO, sure. I genuinely don't believe Dario nor Sam should be trusted at this point. Elon levels of overpromising and underdelivering.
If you want other people to know whether you're being genuine or sarcastic, you'll have to put a bit more effort into your comments. Your comment just adds noise.
It's interesting that they mentioned in the release notes:
"Limited by the capacity of high-end computational resources, the current throughput of the Pro model remains constrained. We expect its pricing to decrease significantly once the Ascend 950 has been deployed into production."
>we implement end-to-end, bitwise batch-invariant, and deterministic kernels with minimal performance overhead
Pretty cool, I think they're the first to guarantee determinism with the fixed seed or at the temperature 0. Google came close but never guaranteed it AFAIK. DeepSeek show their roots - it may not strictly be a SotA model, but there's a ton of low-level optimizations nobody else pays attention to.
Deepseek v4 is basically that quiet kid in the back of the class who never says a word but casually ruins the grading curve for everyone else on the final exam.
I’d like somebody to explain to me how the endless comments of "bleeding edge labs are subsidizing the inference at an insane rate" make sense in light of a humongous model like v4 pro being $4 per 1M. I’d bet even the subscriptions are profitable, much less the API prices.
API prices may be profitable. Subscriptions may still be subsidized for power users. Free tiers almost certainly are. And frontier labs may be subsidizing overall business growth, training, product features, and peak capacity, even if a normal metered API call is profitable on marginal inference.
Research and training costs have to be amortized from somewhere; and labs are always training. I'm definitely keen for the financials when the two files for IPO though, it would be interesting to see; although I'm sure it won't be broken down much.
This price is high even because of the current shortage of inference cards available to DeepSeek; they claimed in their press release that once the Ascend 950 computing cards are launched in the second half of the year, the price of the Pro version will drop significantly
It's the decades of performance doesn't matter SV/web culture. I'd be surprised if over 1% of OpenAI/Anthropic staff know how any non-toy computer system works.
I was thinking the same. How can it be than other providers can offer third-party open source models with roughly the similar quality like this, Kimi K2.6 or GLM 5.1 for 10 times less the price? How can it be that GPT 5.5 is suddenly twice the price as GPT 5.4 while being faster? I don't believe that it's a bigger, more expensive model to run, it's just they're starting to raise up the prices because they can and their product is good (which is honest as long as they're transparent with it). Honestly the movement about subscription costing the company 20 times more than we're paying is just a PR movement to justify the price hike.
Anthropic recently dropped all inclusive use from new enterprise subscriptions, your seat sub gets you a seat with no usage. All usage is then charged at API rates. It’s like a worst of both worlds!
SSO Tax is a large part of it, controls around plug-in marketplace, enforcement of config, observeability of spend. But it’s all pretty weak really for $20 a month.
And Microsoft are going the same route to moving Copilot Cowork over to a utilisation based billing model which is very unusual for their per seat products (I’m actually not sure I can ever remember that happening).
My thoughts exactly. I also believe that subscription services are profitable, and the talk about subsidies is just a way to extract higher profit margins from the API prices businesses pay.
I haven't seen anyone claiming that API prices are subsidized.
At some point (from the very beginning till ~2025Q4) Claude Code's usage limit was so generous that you can get roughly $10~20 (API-price-equivalent) worth of usage out of a $20/mo Pro plan each day (2 * 5h window) - and for good reason, because LLM agentic coding is extremely token-heavy, people simply wouldn't return to Claude Code for the second time if provided usage wasn't generous or every prompt costs you $1. And then Codex started trying to poach Claude Code users by offering even greater limits and constantly resetting everyone's limit in recent months. The API price would have to be 30x operating cost to make this not a subsidy. That would be an extraordinary claim.
Yeah, subscriptions used to be extraordinarily generous. I miss those days, but the reinvigoration of open weight models is super exciting.
I'm still playing with the new Qwen3.6 35B and impressed, now DeepSeek v4 drops; with both base and instruction-tuned weights? There goes my weekend :P
But seriously, it just stems from the fact some people want AI to go away. If you set your conclusion first, you can very easily derive any premise. AI must go away -> AI must be a bad business -> AI must be losing money.
Before the AI bubble that will burst any time now, there was the AI winter that would magically arrive before the models got good enough to rival humans.
As this is a new arch with tons of optimisations, it'll take some time for inference engines to support it properly, and we'll see more 3rd party providers offer it. Once that settles we'll have a median price for an optimised 1.6T model, and can "guesstimate" from there what the big labs can reasonably serve for the same price. But yeah, it's been said for a while that big labs are ok on API costs. The only unknown is if subscriptions were profitable or not. They've all been reducing the limits lately it seems.
Is there evidence that frontier models at anthropic, openai or google or whatnot are not using comparable optimizations to draw down their coats and that their markup is just higher because they can?
I mean, not one "bleeding edge" lab has stated they are profitable. They don't publish financials aside from revenue. And in Anthropic's case, they fuck with pricing every week. Clearly something is wrong here.
> I’d like somebody to explain to me how the endless comments of "bleeding edge labs are subsidizing the inference at an insane rate" make sense in light of a humongous model like v4 pro being $4 per 1M. I’d bet even the subscriptions are profitable, much less the API prices.
One answer - Chinese Communist Party. They are being subsidized by the state.
Their audience is people who build stuff, techs audience is enterprise CEOs and politicians, and anyone else happy to hype up all the questionably timed releases and warnings of danger, white collar irrelevence, or promises of utopian paradise right before a funding round.
doesn't it get tiring after a while? using the same (perceived) gotcha, over and over again, for three years now?
no one is ever going to release their training data because it contains every copyrighted work in existence. everyone, even the hecking-wholesome safety-first Anthropic, is using copyrighted data without permission to train their models. there you go.
There is an easy fix already in widespread use: "open weights".
It is very much a valuable thing already, no need to taint it with wrong promise.
Though I disagree about being used if it was indeed open source: I might not do it inside my home lab today, but at least Qwen and DeepSeek would use and build on what eg. Facebook was doing with Llama, and they might be pushing the open weights model frontier forward faster.
So, this is the version that's able to serve inference from Huawei chips, although it was still trained on nVidia. So unless I'm very much mistaken this is the biggest and best model yet served on (sort of) readily-available chinese-native tech. Performance and stability will be interesting to see; openrouter currently saying about 1.12s and 30tps, which isn't wonderful but it's day one after all.
For reference, the huawei Ascend 950 that this thing runs on is supposed to be roughly comparable to nVidia's H100 from 2022. In other words, things are hotting up in the GPU war!
Can't see how NVIDA justifies its valuation/forward P/E ratio with these developments and on-device also becoming viable for 98% of people's needs when it comes to AI
On-device is incredibly far away from being viable. A $20 ChatGPT subscription beats the hell out of the 8B model that a normal computer can run.
Nvidia's forward PE ratio is only 20 for 2026. That's much lower than companies like Walmart and Costco. It's also growing nearly 100% YoY and has a $1 trillion backlog.
I do think Nvidia isn't that badly priced; they still have the dominance in training and the proven execution
Biggest risk I see is Nvidia having delays / bad luck with R&D / meh generations for long enough to depress their growth projections; and then everything gets revalued.
While SWE-bench Verified is not a perfect benchmark for coding, AFAIK, this is the first open-weights model that has crossed the threshold of 80% score on this by scoring 80.6%.
Back in Nov 2025, Opus 4.5 (80.9%) was the first proprietary model to do so.
For those who rely on open source models but don't want to stop using frontier models, how do you manage it? Do you pay any of the Chinese subscription plans? Do you pay the API directly? After GPT 5.5 release, however good it is, I am a bit tired of this price hiking and reduced quota every week. I am now unemployed and cannot afford more expensive plans for the moment.
Have you considered... not subscribing? You can ask the top models via chats for specific stuff, and then set up some free CLI like mistral.
If you're trying to make a buck while unemployed, sure get a subscription. Otherwise learn how to work again without AI, just focus on the interesting stuff.
I have $20 ChatGPT subscription. Stopped Anthropic $20 subscription since the limit ran out too fast. That's my frontier model(s).
For OSS model, I have z.ai yearly subscription during the promo. But it's a lot more expensive now. The model is good imo, and just need to find the right providers. There are a lot of alternatives now. Like I saw some good reviews regarding ollama cloud.
I’m deeply interested and invested in the field but I could really use a support group for people burnt out from trying to keep up with everything. I feel like we’ve already long since passed the point where we need AI to help us keep up with advancements in AI.
The players barely ever change. People don't have problems following sports, you shouldn't struggle so much with this once you accept top spot changes.
I didn't express this well but my interest isn't "who is in the top spot", and is more _why and _how various labs get the results they do. This is also magnified by the fact that I'm not only interested in hosted providers of inference but local models as well. What's your take on the best model to run for coding on 24GB of VRAM locally after the last few weeks of releases? Which harness do you prefer? What quants do you think are best? To use your sports metaphor it's more than following the national leagues but also following college and even high school leagues as well. And the real interest isn't even who's doing well but WHY, at each level.
It is funny seeing people ping pong between Anthropic and ChatGPT, with similar rhetoric in both directions.
At this point I would just pick the one who's "ethics" and user experience you prefer. The difference in performance between these releases has had no impact on the meaningful work one can do with them, unless perhaps they are on the fringes in some domain.
Personally I am trying out the open models cloud hosted, since I am not interested in being rug pulled by the big two providers. They have come a long way, and for all the work I actually trust to an LLM they seem to be sufficient.
It honestly has all kinda felt like more of the same ever since maybe GPT4?
New model comes out, has some nice benchmarks, but the subjective experience of actually using it stays the same. Nothing's really blown my mind since.
Feels like the field has stagnated to a point where only the enthusiasts care.
For coding Opus 4.5 in q3 2025 was still the best model I've used.
Since then it's just been a cycle of the old model being progressively lobotomised and a "new" one coming out that if you're lucky might be as good as the OG Opus 4.5 for a couple of weeks.
Subjective but as far as I can tell no progress in almost a year, which is a lifetime in 2022-25 LLM timelines
Its on OR - but currently not available on their anthropic endpoint. OR if you read this, pls enable it there! I am using kimi-2.6 with Claude Code, works well, but Deepseek V4 gives an error:
`https://openrouter.ai/api/messages with model=deepseek/deepseek-v4-pro, OR returns
an error because their Anthropic-compat translator doesn't cover V4 yet. The Claude CLI dutifully surfaces that error as "model...does not exist"
For comparison on openrouter DeepSeek v4 Flash is slightly cheaper than Gemma 4 31b, more expensive than Gemma 4 26b, but it does support prompt caching, which means for some applications it will be the cheapest. Excited to see how it compares with Gemma 4.
American companies want a scan of your asshole for the privilege of paying to access their models, and unapologetically admit to storing, analyzing, training on, and freely giving your data to any authorities if requested. Chinese ulteriority is hypothetical, American is blatant.
It’s not remotely hypothetical you’d have to be living under a rock to believe that. And the fusion with a one-party state government that doesn’t tolerate huge swathes of thoughtspace being freely discussed is completely streamlined, not mediated by any guardrails or accountability.
This “no harm to me” meme about a foreign totalitarian government (with plenty of incentive to run influence ops on foreigners) hoovering your data is just so mind-bogglingly naive.
As a non-American, everything you wrote other than "one party" applies to the current US regime.
Relatively speaking, DeepSeek is less untrustworthy than Grok.
When I try ChatGPT on current events from the White House it interprets them as strange hypotheticals rather than news, which is probably more a problem with DC than with GPT, but whatever.
> And the fusion with a one-party state government that doesn’t tolerate huge swathes of thoughtspace being freely discussed
That would be a great argument if the American models weren’t so heavily censored.
The Chinese model might dodge a question if I ask it about 1-2 specific Chinese cultural issues but then it also doesn’t moralize me at every turn because I asked it to use a piece of security software.
>This “no harm to me” meme about a foreign totalitarian government (with plenty of incentive to run influence ops on foreigners) hoovering your data is just so mind-bogglingly naive.
The oppression of people in China like Uyghurs and Hong Kong, the complete lack of free speech, the saber-rattling at neighbours, and the lack of respect for intellectual property are indeed all well documented.
But for folks on the opposite side of the world, the threats are more like "they're selling us electric cars and solar panels too cheaply" and the hypothetical "these super cheap CCTV cameras could be used for remote spying"
> This “no harm to me” meme about a foreign totalitarian government (with plenty of incentive to run influence ops on foreigners) hoovering your data is just so mind-bogglingly naive.
This is why I’ve been urging everyone I know to move away from American based services and providers. It’s slow but honest work.
Pretty sure you guys have a strong laws about free-speech, and criticizing elites is part of that. Though there are some groups that do not really want the 1st amendment to be a thing.
Foreigners are literally being denied entry into the country due to opposing viewpoints expressed on social media. People have to disable FaceID on their phones prior to going through customs in case an agent decides to investigate whether their political views are in opposition to the current administration.
> And you're saying Americans aren't banned from criticising their elites?
Half the country would be locked up right now if they weren’t allowed to criticize Trump. Have you even paid attention to how much he’s shitted on, on a daily basis?
It's a little sad that tech now comes down to geopolitics, but if you're not in the USA then what is the difference? I'm Danish, would I rather give my data to China or to a country which recently threatened the kingdom I live in with military invasion? Ideally I'd give them to Mistral, but in reality we're probably going to continue building multi-model tools to make sure we share our data with everyone equally.
> Internet comments say that open sourcing is a national strategy, a loss maker subsidized by the government. On the contrary, it is a commercial strategy and the best strategy available in this industry.
This sounds whole lot like potatoh potahto. I think the former argument is very much the correct one: China can undercut everyone and win, even at a loss. Happened with solar panels, steel, evs, sea food - it's a well tested strategy and it works really well despite the many flavors it comes in.
That being said a job well done for the wrong reasons is still a job well done so we should very much welcome these contributions, and maybe it's good to upset western big tech a bit so it's remains competitive.
It is not only that Chinese labs can undercut on price. It is that they must. They must give away their models for free by open sourcing them, and they must even give away free inference services for people to try them. That is the point of the post.
There is not ‘must’ here, they did not ‘have’ to undercut every other strategically and technologically important industry the rest of the world has, but they did as a point of national policy.
Please don't slander the most open AI company in the world. Even more open than some non-profit labs from universities. DeepSeek is famous for publishing everything. They might take a bit to publish source code but it's almost always there. And their papers are extremely pro-social to help the broader open AI community. This is why they struggle getting funded because investors hate openness. And in China they struggle against the political and hiring power of the big tech companies.
And DeepSeek often has very cool new approaches to AI copied by the rest. Many others copied their tech. And some of those have 10x or 100x the GPU training budget and that's their moat to stay competitive.
It’s not slander to say something true. These are open weights, not open source. They don’t provide the training data or the methodology requires to reproduce these weights.
So you can’t see what facts are pruned out, what biases were applied, etc. Even more importantly, you can’t make a slightly improved version.
This model is as open source as a windows XP installation ISO.
I don't think we need to compare models to Opus anymore. Opus users don't care about other models, as they're convinced Opus will be better forever. And non-Opus users don't want the expense, lock-in or limits.
As a non-Opus user, I'll continue to use the cheapest fastest models that get my job done, which (for me anyway) is still MiniMax M2.5. I occasionally try a newer, more expensive model, and I get the same results. I have a feeling we might all be getting swindled by the whole AI industry with benchmarks that just make it look like everything's improving.
Which model's best depends on how you use it. There's a huge difference in behaviour between Claude and GPT and other models which makes some poor substitutes for others in certain use cases. I think the GPT models are a bad substitute for Claude ones for tasks such as pair-programming (where you want to see the CoT and have immediate responses) and writing code that you actually want to read and edit yourself, as opposed to just letting GPT run in the background to produce working code that you won't inspect. Yes, GPT 5.4 is cheap and brilliant but very black-box and often very slow IME. GPT-5.4 still seems to behave the same as 5.1, which includes problems like: doesn't show useful thoughts, can think for half an hour, says "Preparing the patch now" then thinks for another 20 min, gives no impression of what it's doing, reads microscopic parts of source files and misses context, will do anything to pass the tests including patching libraries...
Agree with your assessment, I think after models reached around Opus 4.5 level, its been almost indistinguishable for most tasks. Intelligence has been commoditized, what's important now is the workflows, prompting, and context management. And that is unique to each model.
Same for me. There are tasks when I want the smartest model. But for a whole lot of tasks I now default to Sonnet, or go with cheaper models like GLM, Kimi, Qwen. DeepSeek hasn't been in the mix for a while because their previous model had started lagging, but will definitely test this one again.
The tricky part is that the "number of tokens to good result" does absolutely vary, and you need a decent harness to make it work without too much manual intervention, so figuring out which model is most cost-effective for which tasks is becoming increasingly hard, but several are cost-effective enough.
Is Opus nerfed somehow in Copilot? Ive tried it numerous times, it has never reallt woved me. They seem to have awfully small context windows, but still. Its mostly their reasoning which has been off
Codex is just so much better, or the genera GPT models.
Try Charm Crush first, it's a native binary. If it's unbearable, try opencode, just with the knowledge your system will probably be pwned soon since it's JS + NPM + vibe coding + some of the most insufferable devs in the industry behind that product.
If you're feeling frisky, Zed has a decent agent harness and a very good editor.
actually this is not the reason - the harness is significantly better.
There is no comparable harness to Claude Code with skills, etc.
Opencode was getting there, but it seems the founders lost interest. Pi could be it, but its very focused on OpenClaw. Even Codex cli doesnt have all of it.
What's the issue with OC? I tried it a bit over 2 months ago, when I was still on Claude API, and it actually liked more that CC (i.e. the right sidebar with the plan and a tendency at asking less "security" questions that CC). Why is it so bad nowadays?
eh idk. until yesterday opus was the one that got spatial reasoning right (had to do some head pose stuff, neither glm 5.1 nor codex 5.3 could "get" it) and codex 5.3 was my champion at making UX work.
So while I agree mixed model is the way to go, opus is still my workhorse.
How does it compare to Opus 4.7? I've been immersed in 4.7 all week participating in the Anthropic Opus 4.7 hackathon and it's pretty impressive even if it's ravenous from a token perspective compared to 4.6
In theory, sure, but as other have pointed out you need to spend half a million on GPUs just to get enough VRAM to fit a single instance of the model. And you’d better make sure your use case makes full 24/7 use of all that rapidly-depreciating hardware you just spent all your money on, otherwise your actual cost per token will be much higher than you think.
In practice you will get better value from just buying tokens from a third party whose business is hosting open weight models as efficiently as possible and who make full use of their hardware. Even with the small margin they charge on top you will still come out ahead.
There are a lot of companies who would gladly drop half a million on a GPU to have private inference that Anthropic or OpenAI can’t use to steal their data.
And that GPU wouldn’t run one instance, the models are highly parallelizable. It would likely support 10-15 users at once, if a company oversubscribed 10:1 that GPU supports ~100 seats. Amortized over a couple years the costs are competitive.
> There are a lot of companies who would gladly drop half a million on a GPU to have private inference that Anthropic or OpenAI can’t use to steal their data.
Obviously, and certainly companies do run their own models because they place some value on data sovereignty for regulatory or compliance or other reasons. (Although the framing that Anthropic or OpenAI might "steal their data" is a bit alarmist - plenty of companies, including some with _highly_ sensitive data, have contracts with Anthropic or OpenAI that say they can't train future models on the data they send them and are perfectly happy to send data to Claude. You may think they're stupid to do that, but that's just your opinion.)
> the models are highly parallelizable. It would likely support 10-15 users at once.
Yes, I know that; I understand LLM internals pretty well. One instance of the model in the sense of one set of weights loaded across X number of GPUs; of course you can then run batch inference on those weights, up to the limits of GPU bandwidth and compute.
But are those 100 users you have on your own GPUs usings the GPUs evenly across the 24 hours of the day, or are they only using them during 9-5 in some timezone? If so, you're leaving your expensive hardware idle for 2/3 of the day and the third party providers hosting open weight models will still beat you on costs, even without getting into other factors like they bought their GPUs cheaper than you did. Do the math if you don't believe me.
To me, the important thing isn't that I can run it, it's that I can pay someone else to run it. I'm finding Opus 4.7 seems to be weirdly broken compared to 4.6, it just doesn't understand my code, breaks it whenever I ask it to do anything.
Now, at the moment, i can still use 4.6 but eventually Anthropic are going to remove it, and when it's gone it will be gone forever. I'm planning on trying Deepseek v4, because even if it's not quite as good, I know that it will be available forever, I'll always be able to find someone to run it.
No, but businesses do. Being able to run quality LLMs without your business, or business's private information, being held at the mercy of another corp has a lot of value.
But can be, and is, done. I work for a bootstrapped startup that hosts a DeepSeek v3 retrain on our own GPUs. We are highly profitable. We're certainly not the only ones in the space, as I'm personally aware of several other startups hosting their own GLM or DeepSeek models.
Completely agree, not suggesting it needs ot just genuinely curious. Love that it can be run locally though. Open source LLMs punching back pretty hard against proprietary ones in the cloud lately in terms of performance.
- To run with "heavy quantization" (16 bits -> 8): "8xH100", giving us $200K upfront and $4/h.
- To run truly "locally"--i.e. in a house instead of a data center--you'd need four 4090s, one of the most powerful consumer GPUs available. Even that would clock in around $15k for the cards alone and ~$0.22/h for the electricity (in the US).
Truly an insane industry. This is a good reminder of why datacenter capex from since 2023 has eclipsed the Manhattan Project, the Apollo program, and the US interstate system combined...
I remember reading about some new frameworks have been coming out to allow Macs to stream weights of huge models live from fast SSDs and produce quality output, albeit slowly. Apart from that...good luck finding that much available VRAM haha
It is more than good enough and has effectively caught up with Opus 4.6 and GPT 5.4 according to the benchmarks.
It's about 2 months behind GPT 5.5 and Opus 4.7.
As long as it is cheap to run for the hosting providers and it is frontier level, it is a very competitive model and impressive against the others. I give it 2 years maximum for consumer hardware to run models that are 500B - 800B quantized on their machines.
It should be obvious now why Anthropic really doesn't want you to run local models on your machine.
Vibes > Benchmarks. And it's all so task-specific. Gemini 3 has scored very well in benchmarks for very long but is poor at agentic usecases. A lot of people prefering Opus 4.6 to 4.7 for coding despite benchmarks, much more than I've seen before (4.5->4.6, 4->4.5).
Doesn't mean Deepseek v4 isn't great, just benchmarks alone aren't enough to tell.
Apparently glm5.1 and qwen coder latest is as good as opus 4.6 on benchmarks. So I tried both seriously for a week (glm Pro using CC) and qwen using qwen companion. Thought I could save $80 a month. Unfortunately after 2 days I had switched back to Max. The speed (slower on both although qwen is much faster) and errors (stupid layout mistakes, inserting 2 footers then refusing to remove one, not seeing obvious problems in screenshots & major f-ups of functionality), not being able to view URLs properly, etc. I'll give deepseek a go but I suspect it will be similar. The model is only half the story. Also been testing gpt5.4 with codex and it is very almost as good as CC... better on long running tasks running in background. Not keen on ChatGPT codex 'personality' so will stick to CC for the most part.
Their Chinese announcement says that, based on internal employee testing, it is not as good as Opus 4.6 Thinking, but is slightly better than Opus 4.6 without Thinking enabled.
That's super interesting, isn't Deepseek in China banned from using Anthropic models? Yet here they're comparing it in terms of internal employee testing.
They use VPN to access. Even Google Deepmind uses Anthropic. There was a fight within Google as to why only DeepMind is allowed to Claude while rest of the Google can't.
There we go again :) It seems we have a release each day claiming that. What's weird is that even deepseek doesn't claim it's better than opus w/ thinking. No idea why you'd say that but anyway.
Dsv3 was a good model. Not benchmaxxed at all, it was pretty stable where it was. Did well on tasks that were ood for benchmarks, even if it was behind SotA.
This seems to be similar. Behind SotA, but not by much, and at a much lower price. The big one is being served (by ds themselves now, more providers will come and we'll see the median price) at 1.74$ in / 3.48$ out / 0.14$ cache. Really cheap for what it offers.
The small one is at 0.14$ in / 0.28$ out / 0.028$ cache, which is pretty much "too cheap to matter". This will be what people can run realistically "at home", and should be a contender for things like haiku/gemini-flash, if it can deliver at those levels.
> According to evaluation feedback, its user experience is better than Sonnet 4.5, and its delivery quality is close to Opus 4.6's non-thinking mode, but there is still a certain gap compared to Opus 4.6's thinking mode.
For the curious, I did some napkin math on their posted benchmarks and it racks up 20.1 percentage point difference across the 20 metrics where both were scored, for an average improvement of about 2% (non-pp). I really can't decide if that's mind blowing or boring?
Claude4.6 was almost 10pp better at at answering questions from long contexts ("corpuses" in CorpusQA and "multiround conversations" in MRCR), while DSv4 was a staggering 14pp better at one math challenge (IMOAnswerBench) and 12pp better at basic Q&A (SimpleQA-Verified).
On a seperate note, I am guessing that all the new models have announced in the space of a few days because the time to train a model is the same for each AI company.
Which strikes me as odd - Inwoukd have assumed someone had an edge in terms of at least 10% extra GPUs.
The Flash version is 284B A13B in mixed FP8 / FP4 and the full native precision weights total approximately 154 GB. KV cache is said to take 10% as much space as V3. This looks very accessible for people running "large" local models. It's a nice follow up to the Gemma 4 and Qwen3.5 small local models.
Just tested it via openrounter in the Pi Coding agent and it regularly fails to use the read and write tool correctly, very disappointing. Anyone know a fix besides prompting "always use the provided tools instead of writing your own call"
This is shockingly cheap for a near frontier model. This is insane.
For context, for an agent we're working on, we're using 5-mini, which is $2/1m tokens. This is $0.30/1m tokens. And it's Opus 4.6 level - this can't be real.
I am uncomfortable about sending user data which may contain PII to their servers in China so I won't be using this as appealing as it sounds. I need this to come to a US-hosted environment at an equivalent price.
Hosting this on my own + renting GPUs is much more expensive than DeepSeek's quoted price, so not an option.
> For context, for an agent we're working on, we're using 5-mini, which is $2/1m tokens. This is $0.30/1m tokens. And it's Opus 4.6 level - this can't be real.
It's doesn't seem all that out there compared to the other Chinese model price/performance? Kimi2.6 is cheaper even than this, and is pretty close in performance
This is just a random thought, but have you tried doing an 'agentic' pelican?
As in have the model consider its generated SVG, and gradually refine it, using its knowledge of the relative positions and proportions of the shapes generated, and have it spin for a while, and hopefully the end result will be better than just oneshotting it.
Or maybe going even one step further - most modern models have tool use and image recognition capabilities - what if you have it generate an SVG (or parts/layers of it, as per the model's discretion) and feed it back to itself via image recognition, and then improve on the result.
I think it'd be interesting to see, as for a lot of models, their oneshot capability in coding is not necessarily corellated with their in-harness ability, the latter which really matters.
I tried that for the GPT-5 launch - a self-improving loop that renders the SVG, looks at it and tries again - and the results were surprisingly disappointing.
I should try it again with the more recent models.
Being a bicycle geometry nerd I always look at the bicycle first.
Let me tell you how much the Pro one sucks... It looks like failed Pedersen[1]. The rear wheel intersects with the bottom bracket, so it wouldn't even roll. Or rather, this bike couldn't exist.
The flash one looks surprisingly correct with some wild fork offset and the slackest of seat tubes. It's got some lowrider[2] aspirations with the small wheels, but with longer, Rivendellish[3], chainstays. The seat post has different angle than the seat tube, so good luck lowering that.
This is an excellent comment. Thanks for this - I've only ever thought about whether the frame is the right shape, I never thought about how different illustrations might map to different bicycle categories.
I wonder which model will try some more common spoke lacing patterns. Right now there seems to be a preference for radial lacing, which is not super common (but simple to draw). The Flash and Pro one uses 16 spoke rims, which actually exist[1] but are not super common.
The Pro model fails badly at the spokes. Heck, the spokes sit on the outside of the drive side of the rim and tire. Have a nice ride riding on the spokes (instead of the tire) welded to the side of your rim.
Both bikes have the drive side on the left, which is very very uncommon. That can't exist in the training data.
I think the pelican on a bike is known widely enough that of seizes to be useful as a benchmark. There is even a pelican briefly appearing in the promo video of GPT-5, if I'm not mistaken https://openai.com/gpt-5/. So the companies are apparently aware of it.
At this point 'frontier model release' is a monthly cadence, Kimi 2.6 Claude 4.6 GPT 5.5, the interesting question is which evals will still be meaningful in 6 months.
Funny how Gemini is theoretically the best -- but in practice all the bugs in the interface mean I don't want to use it anymore. The worst is it forgets context (and lies about it), but it's very unreliable at reading pdfs (and lies about it). There's also no branch, so once the context is lost/polluted, you have to start projects over and build up the context from scratch again.
Feels like the real story here is cost/performance tradeoff rather than raw capability. Benchmarks keep moving incrementally, but efficiency gains like this actually change who can afford to build on top.
You can, but does it work well? I assume CC has all kinds of Claude specific prompts in it, wouldn't you be better with a harness designed to be model agnostic like pi.dev or OpenCode?
I've been using all Kimi K2.6, gpt-5.4 and now Deepseek v4 (thought not extensively yet) in Claude Code and I can say it works much better than you'd expect. It looks like the system prompt and tools are pulling a lot of weight. Maybe the current models are good enough that you don't need them to be trained for a specific harness.
I don't mind that High Flyer completely ripped off Anthropic to do this so much as I mind that they very obviously waited long enough for the GAB to add several dozen xz-level easter eggs to it.
It is great! I asked the question what I always ask of new models ("what would Ian M Banks think about the current state of AI") and it gave me a brilliant answer! Funny enough the answer contained multiple criticisms of his own creators ("Chinese state entities", "Social Credit System").
Actually the fact the inference of a SOTA model is completely Nvidia-free is the biggest attack to Nvidia every carried so far. Even American frontier AI labs may start to buy Chinese hardware if they need to continue the AI race, they can't keep paying so much money for the GPUs, especially once Huawei training versions of their GPUs will ship.
SOTA MRCR (or would've been a few hours earlier... beaten by 5.5), I've long thought of this as the most important non-agentic benchmark, so this is especially impressive. Beats Opus 4.7 here
For flash? 4 bit quant, 2x 96GB gpu (fast and expensive) or 1x 96GB gpu + 128GB ram (still expensive but probably usable, if you’re patient).
A mac with 256 GB memory would run it but be very slow, and so would be a 256GB ram + cheapo GPU desktop, unless you leave it running overnight.
The big model? Forget it, not this decade. You can theoretically load from SSD but waiting for the reply will be a religious experience.
Realistically the biggest models you can run on local-as-in-worth-buying-as-a-person hardware are between 120B and 200B, depending on how far you’re willing to go on quantization. Even this is fairly expensive, and that’s before RAM went to the moon.
There is no BF16. There is no FP8 for the instruct model. The instruct model at full precision is 160 GB (mixed FP4 and FP8). The base model at full precision is 284 GB (FP8). Almost everyone is going to use instruct. But I do love to see base models released.
Run on an old HEDT platform with a lot of parallel attached storage (probably PCIe 4) and fetch weights from SSD. You'd ultimately be limited by the latency of these per-layer fetches, since MoE weights are small. You could reduce the latencies further by buying cheap Optane memory on the second-hand market.
The low end could be something like an eBay-sourced server with a truckload of DDR3 ram doing all-cpu inference - secondhand server models with a terabyte of ram can be had for about 1.5K. The TPS will be absolute garbage and it will sound like a jet engine, but it will nominally run.
The flash version here is 284B A13B, so it might perform OK with a fairly small amount of VRAM for the active params and all regular ram for the other params, but I’d have to see benchmarks. If it turns out that works alright, an eBay server plus a 3090 might be the bang-for-buck champ for about $2.5K (assuming you’re starting from zero).
lots of great stuff, but the plot in the paper is just chart crime.
different shades of gray for references where sometimes you see 4 models and sometimes 3.
"Due to constraints in high-end compute capacity, the current service capacity for Pro is very limited. After the 950 supernodes are launched at scale in the second half of this year, the price of Pro is expected to be reduced significantly."
Was expecting that the release would be this month [1], since everyone forgot about it and not reading the papers they were releasing and 7 days later here we have it.
One of the key points of this model to look at is the optimization that DeepSeek made with the residual design of the neural network architecture of the LLM, which is manifold-constrained hyper-connections (mHC) which is from this paper [2], which makes this possible to efficiently train it, especially with its hybrid attention mechanism designed for this.
There was not that much discussion around it some months ago here [3] about it but again this is a recommended read of the paper.
I wouldn't trust the benchmarks directly, but would wait for others to try it for themselves to see if it matches the performance of frontier models.
Either way, this is why Anthropic wants to ban open weight models and I cannot wait for the quantized versions to release momentarily.
More like he wants to ban accelerator chip sales to China, which may be about “national security” or self preservation against a different model for AI development which also happens to be an existential threat to Anthropic. Maybe those alternatives are actually one and the same to him.
Using it with opencode sometimes it generates commands like:
bash({"command":"gh pr create --title "Improve Calendar module docs and clean up idiomatic Elixir" --body "$(cat <<'EOF'
Problem
The Calendar modu...
like generating output, but not actually running the bash command so not creating the PR ultimately. I wonder if it's a model thing, or an opencode thing.
How long does it usually take for folks to make smaller distills of these models? I really want to see how this will do when brought down to a size that will run on a Macbook.
Weren't there some frameworks recently released to allow Macs to stream weights from fast SSDs and thus fit way more parameters than what would normally fit in RAM?
I have never tried one yet but I am considering trying that for a medium sized model.
I've been calling that the "streaming experts" trick, the key idea is to take advantage of Mixture of Expert models where only a subset of the weights are used for each round of calculations, then load those weights from SSD into RAM for each round.
As I understand it if DeepSeek v4 Pro is a 1.6T, 49B active that means you'd need just 49B in memory, so ~100GB at 16 bit or ~50GB at 8bit quantized.
v4 Flash is 284B, 13B active so might even fit in <32GB.
The "active" count is not very meaningful except as a broad measure of sparsity, since the experts in MoE models are chosen per layer. Once you're streaming experts from disk, there's nothing that inherently requires having 49B parameters in memory at once. Of course, the less caching memory does, the higher the performance overhead of fetching from disk.
Streaming weights from RAM to GPU for prefill makes sense due to batching and pcie5 x16 is fast enough to make it worthwhile.
Streaming weights from RAM to GPU for decode makes no sense at all because batching requires multiple parallel streams.
Streaming weights from SSD _never_ makes sense because the delta between SSD and RAM is too large. There is no situation where you would not be able to fit a model in RAM and also have useful speeds from SSD.
These are more like experiments than a polished release as of yet. And the reduction in throughput is high compared to having the weights in RAM at all times, since you're bottlenecked by the SSD which even at its fastest is much slower than RAM.
But if it does, then in the following week we'll see DeepSeek4 floods every AI-related online space. Thousands of posts swearing how it's better than the latest models OpenAI/Anthropic/Google have but only costs pennies.
Then a few weeks later it'll be forgotten by most.
It's difficult because even if the underlying model is very good, not having a pre-built harness like Claude Code makes it very un-sticky for most devs. Even at equal quality, the friction (or at least perceived friction) is higher than the mainstream models.
If one finds it difficult to set up OpenCode to use whatever providers they want, I won't call them 'dev'.
The only real friction (if the model is actually as good as SOTA) is to convince your employer to pay for it. But again if it really provides the same value at a fraction of the cost, it'll eventually cease to be an issue.
"If one finds it difficult to set up OpenCode to use whatever providers they want, I won't call them 'dev'."
I feel the same way. But look at the ollama vs llama.cpp post from HN few days back and you will see most of the enthusiasts in this space are very non technical people.
Open Source as it gets in this space, top notch developer documentation, and prices insanely low, while delivering frontier model capabilities. So basically, this is from hackers to hackers. Loving it!
Also, note that there's zero CUDA dependency. It runs entirely on Huawei chips. In other words, Chinese ecosystem has delivered a complete AI stack. Like it or not, that's a big news. But what's there not to like when monopolies break down?
The incredible arrogance and hybris of the American initiated tech war - it is just a beautiful thing to see it slowly fall apart.
The US-China contest aside - it is in the application layer llms will show their value. There the field, with llm commoditization and no clear monopolies, is wide open.
There was a point in time where it looked like llms would the domain of a single well guarded monopoly - that would have been a very dark world. Luckily we are not there now and there is plenty of grounds for optimism.
Still not sure how I feel about China of all places to control the only alternative AI stack, but I guess it's better than leaving everything to the US alone. If China ever feels emboldened enough to go for Taiwan and the US descends into complete chaos, the rest of the world running on AI will be at the mercy of authoritarian regimes. At the very least you can be sure noone is in this for the good of the people anymore. This is about who will dominate the world of tomorrow. And China has officially thrown their hat in the ring.
I always find it an illuminating experience about the power of mass propaganda every time I see an American believe they somewhat have the moral high ground over China, despite starting a new war somewhere around the globe either for petrol or on behalf of Israel every six months.
Not American, but it undoubtedly does have moral high ground.
Chinas setup is amazing for moving from undeveloped to developed, with 30 years of momentum and everyone pulling in one direction.
That doesn't mean it's positive in human rights
> That doesn't mean it's positive in human rights
Isn't the US building mass detention camps right now for all the brown people there? And arresting / detaining / demanding papers from any and everyone? With federal agents killing civilians?
Don't get me wrong, China is also horrible here, they have their own camps.
But pretending the US is positive wrt human rights is a wild take in 2026.
[delayed]
Just because America is doing bad things doesn't mean China is good, or vice versa.
> Just because America is doing bad things doesn't mean China is good, or vice versa.
Of course not. When it comes to SOTA LLMs you have the choice between two bad options. For many, choosing the Chinese option is just choosing the lesser of two evils (and it's much cheaper).
The Uyghur say hi.
Didn’t say moral high ground, they said democratic. America is (for now) still a democracy. China is a dictatorship.
Not very democratic to invade other countries on the whim of a president.
> they said democratic
They didn't even say that. They only said China playing is "better than leaving everything to the US alone."
For now indeed, the people that want to get rid of it are currently in power.
The US was one of the first democracies in the world, and many countries followed suit. But the US hasn't kept up, and now the powers that be have exploited the weaknesses in the system. With arguably the biggest one being giving the president too much power (appointing supreme court justices, executive orders, etc).
Democracy in most of the countries is just theater. Trump promised no more wars iirc.
Don't get me wrong, I'd rather live in a country without a million cameras that automatically fine me for crossing the street illegally but I don't actually deceive myself in thinking my vote counts for much.
> I'd rather live in a country without a million cameras that automatically fine me for crossing the street illegally
Agreed, but there again, the democracies have surveillance capitalism, it's not exactly like we're not being tracked.
You let Trump and all the tech-bro shitheads win with that attitude unfortunately. Democracy is an ongoing battle.
Moral stances aside, I'd argue it's healthy that the US gets competition from abroad. I appreciate the boost that the world is getting from China - infrastructure and construction projects are a huge benefit to economies. Their focus on green energy has caused a huge influx of affordable solar panels, home batteries, EVs, etcetera, helping reduce the dependency on fossil fuels - while the US and especially the other big money spenders in the middle east would rather the world remain fully dependent on them. But for the past years Europe and now Asia are feeling the pain from being overly reliant on that.
China's policies and government aren't morally defensible and I do fear that they will become more aggressive in spreading their influence and policies onto other countries, but from an economic standpoint what they're doing is super effective. While the previous world power (the US) is stuck in infighting and going through cycles of fixing/undoing the previous administration's damages, instead of planning ahead.
Competition with the Soviet Union gave all the workers in the world better conditions, also advances in science and technology... (And risk of mutual destruction ;)), even if the USSR wasn't good.
Isn’t Mistral close in the ballpark?
AFAIK: Current Mistral models are not competitive with SOTA-models that come out of the USA or China. They are "good enough" for enterprise usage when you don't need SOTA performance.
Their main selling point is: They are neither US-American nor Chinese. That's a real moat in today's world. I think at the moment they feel quite comfortable.
They arent. Benchmark wise they are quite apart.
As a Brit I'm here for it to be honest, I'm tired of America with everything that's going on.
China is not perfect but a bit of competition is healthy and needed
I don’t know if we’re ahead of the curve but that tired feeling has started turning into hate here in the EU. I guess being threatened with invasion does that to you.
The next decade is going to look very different with America Alone.
I grew up in the states when I was younger, always feeling some closeness to Americans even after I moved back to Europe.
With all that goes on it has changed. Recently I sat on a plane near some Americans discussing their holidays here, and I noticed I felt contempt. Sitting their with insane privilege as their government torches the world.
Individuals remain individuals, and one really ought not to be prejudice. However the lack of resistance I see in in the “land of the free” as their “democratic” institutions collapse just makes me believe they never cared at all. In France cars are torched if the pension age is raised. In America the rise facism apparently doesnt matter to them.
I a European who spent the last decade in America and I'm not sure I'd call Americans privileged compared to Europe. With money being the one means you have to be treated well in society, comparing it to Europe, America feels like the hunger games. Want healthcare (ie surviving)? Healthy food? To own your house? Welcome to the games
From my small bubble it's not that. I'm Dutch, married to an American who now knows enough Dutch such that we can treat it as a secret language when we're in the US.
My family in law seems to swing slightly republican. As a Dutchie, I could get some answers because I'm too naive not to talk about politics. So I got to probe a bit. What I simply found was that they'd say "I can't trust the news, none of it. Not CNN, not Fox News, nothing". Then I'd say "well in the Netherlands, I'd argue that while news outlets have their bias, you can trust them on basic factual reporting". She looked at me with a stare that I could only describe as "oh but honey, you're too young and naive to understand". To which I thought "you don't know the Netherlands. We're not perfect but we're nowhere near as deranged as what I'm seeing here".
I think that explains a lot of it for some people. The trust in the media, all media, is completely broken. Trump has how many fellonies now? Can't trust it. Kamala is doing what now? All talk. DOGE is fixing the government? I fucking hope so! But can't trust the damn news. Whether they do or don't, they are always burning money, god damn bureaucrats.
I feel that's the mindset that my family in law has.
I think this is spot on. "Every fault of america is just how it is in any society.". Nice way to just accept it.
Out of curisoity, what is your wife's take?
My running hypothesis has been the trust breakdown arises from social-media overexposure driving lazy nihilism, which in turn gave free reign to a uniquely-corrupt class of politicians. But I'm not sure how to neutrally evaluate that.
not all of us are just "sitting here with insane privilege." it's quite dangerous for some of us right now.
I'm trans. this Administration does not like us. after Charlie Kirk's murder, things got legitimately scary. Musk was retweeting people who called us "deranged bioweapons" who needed to be "forcibly institutionalized." NSPM-7 is surveilling and infiltrating trans organizations. the Heritage Foundation proposed labeling us as "ideological extremists," in the same category as neo-Nazis. if I'm arrested, I'll go to a men's prison where I'll likely be given to a violent inmate as his cellmate to "pacify" him (V-coding.)
so yeah, I keep my head down. a lot of Jews kept their heads down in Germany in the '30s, you know? and just like then, it doesn't seem like other countries are too keen on taking us in as refugees. I hope that changes if things get bleak.
Get out seems an important priority. Good luck
It’s not that it doesn’t matter to Americans. It is worse; half the population (or at least, half the voting population), is thrilled with the development of fascism. The other half has been ringing the alarm bells for well over a decade; it seems to make no difference.
And you’re right, most Americans do not understand the privileges they have or give one single shit about democracy; it is just not a salient political issue. But eggs… don’t get me started on eggs.
It's probably a bit more nuanced than "half this, half that"; when you look at the facts, most voters aren't that extremist. A lot of votes vote one way or the other because they would simply never vote for the other.
This is why the swing voters / swing states are so important in the US, because only a few million are flexible enough to switch sides.
Of course the core issue is that there's a two party system; while I'm sure that in a healthy democracy the current republican and democrat parties would be the bigger ones, they wouldn't have a majority.
> In France cars are torched if the pension age is raised.
This is not something to be proud of. You guys are giving yourself loaned freebies, retiring 5+ (!) years earlier than countries like BeNeLux and Germany, and are pretty much expecting the EU to eventually pick up the pieces which will drag us all down.
Edit: always lovely when HN downvotes truths :)
That's bullshit. Pensions are not a zero-sum game, and other countries don't have to pay for them.
It just doesn't make sense to delay retirement while youth unemployment is such a big problem. We ALL should be fighting like France, in many aspects.
Other countries don't directly pay for the pensions, but France is staring into a giant fiscal abyss because of their low retirement age (and other generous social benefits). Any attempt to change those results in the country being taken hostage by rioters, thus nothing changes.
At some point France will be in too deep shit in look to the EU to cover for them. We will all pay for that. And it is deeply unfair because other countries their citizens have accepted later retirement and more frugal benefits to keep their country fiscally healthy.
France could cover the fiscal hole in other ways, but taxing corporations and wealth higher also consistently ends up being blocked. And each year the hole gets deeper.
The reason you see the distancing from America in European media is the Republican government. Democrats were not better to Europe than Republicans, but you have to understand that current European leaders are vassals to the US Democrats and the Epstein-List".
In the near future, one of two things will happen:
- Democrats will regain power, and the EU will be happy with the USA again.
- EU regime will swear an oath to the Republicans and continue to be vassals to them and the Epstein-List.
Whatever happens, one thing is sure: Epstein-List will stay in control.
america is a continent. let’s take back our vocabulary (fellow european here). the little orange man shows very well what i mean when he started giving names to the gulf of mexico.
"not perfect" is a _very_ big simplification of what China is though
Isn't that the same to every major superpower?
Whatbaoutism at it's finest.
Have a peek at the fredom indx and the press freedom index for China. Guess where they stand?
You know about the chinese internet firewall.
You can't trust any data from the CCP.
And please don't equate the aberration that is the Trump administration with "regular" US administrations (and this is coming from a non US person).
Regular US administrations that commited war crimes in half the world for decades. But apparently it only matters what they do in the US.
People in China live under totalitarian rule, that much is true.
But how free is the average North American, where getting sick can bring you and your family financial ruin? Where the "free press" is controlled by corporations who are also the main source of campaign funding for politicians? Where their urban spaces are designed to require you to have a car and promote complete atomized individuals?
You’re right, for now, but I think trump will try to turn America into a dictatorship.
..you forgot to mention that any technology in China, foreign or domestic, can and will be used for and to the benefit of the -military- party.. But like someone posted: "not perfect" fits the bill.
Check out the Sean Ryan Show with Palmer Luckey on China and military tech.
No. There is no moral equivalence with totalitarianism.
Modern China isn't exactly totalitarian though and US is rapidly converging with China in that regard anyway.
How totalitarian is exactly totalitarian? I asked chatgpt and it gave few points
- Control goes beyond politics
- A single, all-encompassing ideology
- No meaningful private sphere
- Mass mobilization and propaganda
- Extensive surveillance and repression
Seems like China is ticking all the boxes.
China is not totalitarian. Many people believe that China is still like 1950s-60s-era Maoist China, but it's just not.
tiananmen square was in 1989. Hong Kong was snuffed out like a light. Covid saw people caged and sealed in their houses. You do not need to look back at the cultural revolution to see the prc for what it is.
Is your contention that Hong Kong is also a totalitarian society? Have you been to Hong Kong in the last 5 years? I feel like people saying these sorts of things are just completely divorced from reality.
> Covid saw people caged and sealed in their houses.
No. There were a few incidents very early on, when everyone was (quite understandably) panicking about a new, deadly virus that nobody had ever seen before, when some local city officials barred the doors of people who had just come from Wuhan. That was a scandal inside China, and it was immediately reversed.
What China did do quite extensively was border quarantine, and during localized outbreaks (caused by cases that slipped through quarantine at the border), mass testing and quarantine measures. This was during a once-in-a-generation pandemic that killed millions of people. In China, these measures saved several million lives. The estimates are that China's overall death rate was about 25% that of the US, and these measures are the reason. By the way, Taiwan and Australia took nearly identical measures, and I very much doubt that you would call them totalitarian societies.
Which are the current nontotalitarian superpowers?
That's also the current US administration.
Luckily laws still stand somewhat.
( And Trump ain't smart enough)
You can say the same about the US
they compare it to fascist USA though
Ask a gay, a black or a Japanese how it feels living in China.
Fellow countryman here. I came here to say the same thing
As a different Brit I do not accept such moral relativism.
China’s governments actions are on a completely different level - for example:
“””
Since 2014, the government of the People's Republic of China has committed a series of ongoing human rights abuses against Uyghurs and other Turkic Muslim minorities in Xinjiang which has often been characterized as persecution or as genocide.
“”” https://en.wikipedia.org/wiki/Persecution_of_Uyghurs_in_Chin...
https://www.amnesty.org/en/location/asia-and-the-pacific/eas...
Yes Trump is clearly trying Totalitarianism in America, but it is orders of magnitude different from what is happening in China.
Why do we ignore all the human right abuses the US perform abroad? Iraq, Afghanistan, now Iran, Gaza and Lebanon through Israel, support to Saudi Arabia (which would not exist without the US), El Salvador... And inside it's also horrible with its treatment to immigrant.
That should be at least comparable (if not worse) than what China is doing.
Yes, El Salvador is so evil for imprisoning dangerous criminals and protecting innocent lives.
genocide right, that's why Uighurs were allowed to have more children than majority of Chinese Han population /facepalm
by your logic gentrification of neighborhoods with different people moving in is genocide as well
Btw. remind me when last tiem China bombed school and killed 150+ school girls as your friend US?
Or as Brit I hope you are proud about all the killing your country participated in in illegal invasion to Iraq based on fake news about WMD.
It's 2026 and people still believe this Uyghur genocide propaganda? In the meantime, Israel and the US have been killing people in the middle east for years, but china is "on a completely different level"?
This is such a tired argument, and morally repugnant. Where is the UK in the race, where is the EU? Lets get of our asses and stop moralizing.
(China wiped out the entire EU industry through a "quiet" trade war since like the last 15 years, and we're not really talking about that aren't we...)
Not so much a trade war as basic economic forces, and it's been going on for much longer than that. When infrastructure improves, companies and customers can look further to get their stuff done. If it's cheaper to do your industrial or manufacturing work abroad and have it transported to your country, that just happens.
The powers that be try to slow this down by banning imports outright (you can't for example import American chicken into Europe because of food safety laws), or high import taxes (Chinese EVs have a 50% import tax in Europe and the US to protect the local car manufacturers. Which is fair because the Chinese EV manufacturers are state-sponsored so their prices are unfair. Then again, western companies get billions in investor money to push the prices down).
UK has the people but not the electric grid/infrastructure to compete.
EU/France has Mistral.
You mean the west handed their industry to china over the last 15 years? Its not like the US is any better off in this. The EU is not a country, so you can't talk about it as if it was. Each country has their own companies and industries. There is AI in Europe, and its growing, however we might not be as "energetic" about destroying our countries to build giant data centers to serve our billionaire overlords. That does not mean that there is no investment, there is, including a bunch of American corporations like Amazon. But there is also a lot of corruption and bribing (lobbying - lets call it what it really is, no more whitewashing) going on around that too.
So again, stop referring to EU as a country, we are not, and it just annoys any Europeans as it comes of as "Americans who don't understand the world outside of the USA".
> It runs entirely on Huawei chips.
They may say that, but we can't assume that. Given it's Chine, we actually have to assume it's mostly false. And since there'll never be a proper audit by an independent party of all of their data centers, we'll never know.
This is so unbelievable racist and deranged.
Let's see how long it takes before the big US AI companies start lobbying to outright ban use of Chinese AI, even the open source / local models. For "national security" reasons, of course.
Hopefully the US’ self imposed isolation will mean that when they do, they aren’t able to force the rest of the world to follow suit.
"Open Source" is the ultimate romance understood by software engineers.
I can't find any info on what exactly is open sourced.
And in any case what does open source actually mean for an llm? It's not like you can look inside it to see what it's doing.
For me open source means that the entire training data is open sourced as well as the code used for training it otherwise it's open weight. You can run it where you like but it's a black box. Nomic's models are good example of opensource.
But remember to not ask about Taiwan!
> China asks other country not to meddle with internal separatism > They also dont support separatism in my country
Understandable.
you talk like there isn't censorship in american AIs, like Israel topics.
To be fair I prefer the Chinese models censorship (yes, seriously) because if you ask certain topics they just don't answer instead of giving skewed answers.
Quit a bit better then made to bomb little girl schools in Iran.
Just ask it for a summary of the USA’s role in Iran, Gaza, Lebanon and its recent threats against Panama, Cuba and Greenland! It might be able to keep track.
Does all this insane behavior from the US justify the Chinese censorship?
Are you implying that western models were manipulated to hide and distort those events, like they do with the Tiananmen Square event, and Taiwan?
Let's say I'm more outraged by the actual events.
History is by definition his story.
It's not. It's an English pun on a Greek word, which roughly means "investigation".
Ask Gemini today if the United States is trying to destroy the nation of Iran, and it will feed you the (white-washed) party line, straight from the White House, with a bit of 'some people disagree' thrown in. No mention of America's threats of "Complete annihilation", "Killing a civlization", and all the rest.
> Summary: The U.S. is currently engaged in an active war aimed at dismantling the Iranian government and its military capabilities, but it distinguishes this from destroying the country or its people. However, the humanitarian impact—including civilian casualties from airstrikes and the domestic crackdown by Iranian security forces—has led many international observers to warn that the campaign risks long-term instability and "state collapse" rather than a simple transition of power.
It does do quite a bit better if you ask it about the genocide in Gaza, summarizing the case for it, and citing only token justifications from the guilty party.
As of April 2026, Gemini is... For very obvious reasons, highly biased towards cultural consensus. If your cultural consensus is strong on some really messed up things, that's the outcome that it's going to give you.
Isn't there a difference between the models output reflecting the mean of public discourse and the active adjustment of information by the government?
Irrespective of how close the outcomes are to the actual facts, those two things have a different quality, don't they?
> Isn't there a difference between the models output reflecting the mean of public discourse and the active adjustment of information by the government?
Not as much a difference as you would wish, as mean of public discourse is very actively managed, to our collective detriment, by a very small group of powerful people, which often includes the government. It's the nature of mass media.
They Thought They Were Free, and all that. By the time the 'mean of public discourse' centers on something incredibly stupid or awful, nobody can be arsed to figure out who planted that idea in our heads.
I sometimes wonder if there are any security risks with using Chinese LLMs. Is there?
Theoretically yes. It is entirely possible to poison the training data for a supply chain attack against vibe coders. The trick would be to make it extremely specific for a high value target so it is not picked up by a wide range of people. You could also target a specific open source project that is used by another widely used product.
However there is so many factors involved beyond your control that it would not be a viable option compared to other possible security attacks.
But propaganda or non ethical marketing - why not? (That is bias toward pointing to certain provider(s)).
If there is, couldn't they exist in any model?
I don't mean that flippantly. These things are dumped in the wild, used on common (largely) open source execution chains. If you find a software exploit, it's going to affect your population too.
Wet exploits are a bit harder to track. I'd assume there are plenty of biases based on training material but who knows if these models have a MKUltra training programme integrated into them?
Backdooring software at scale.
Spearphishing.
Building reliance and exploiting it, through state subsidies, dumping, and market manipulation.
Handicapping provision to the west for competitive advantage.
What about LLMs from other origins? What makes them less risky?
From my experience, kinda the opposite? It's like Chinese software is... Harder to weaponize or hurt yourself on. Deepseek is definitely censored, but I've never caught it being dishonest in a sneaky way.
There must be. The executives at my company wouldn't have banned them all for no reason after all.
Is this a serious comment? It honestly reads like the last famous words.
Of course there are risks.
In their paper, point 5.2.5 talks about their sandboxing platform(DeepSeek Elastic Compute). It seems like they have 4 different execution methods: function calls, container, microVM and fullVM.
This is a pretty interesting thing they've built in my opinion, and not something I'd expect to be buried in the model paper like this. Does anyone have any details about it? Google doesn't seem to find anything of note, and I'd love to dive a bit deeper into DSec.
There are quite a few comments here about benchmark and coding performance. I would like to offer some opinions regarding its capacity for mathematics problems in an active research setting.
I have a collection of novel probability and statistics problems at the masters and PhD level with varying degrees of feasibility. My test suite involves running these problems through first (often with about 2-6 papers for context) and then requesting a rigorous proof as followup. Since the problems are pretty tough, there is no quantitative measure of performance here, I'm just judging based on how useful the output is toward outlining a solution that would hopefully become publishable.
Just prior to this model, Gemini led the pack, with GPT-5 as a close second. No other model came anywhere near these two (no, not even Claude). Gemini would sometimes have incredible insight for some of the harder problems (insightful guesses on relevant procedures are often most useful in research), but both of them tend to struggle with outlining a concrete proof in a single followup prompt. This DeepSeek V4 Pro with max thinking does remarkably well here. I'm not seeing the same level of insights in the first response as Gemini (closer to GPT-5), but it often gets much better in the followup, and the proofs can be _very_ impressive; nearly complete in several cases.
Given that both Gemini and DeepSeek also seem to lead on token performance, I'm guessing that might play a role in their capacity for these types of problems. It's probably more a matter of just how far they can get in a sensible computational budget.
Despite what the benchmarks seem to show, this feels like a huge step up for open-weight models. Bravo to the DeepSeek team!
Wondering how gpt 5.5 is doing in your test. Happy to hear that DeepSeek has good performance in your test, because my experience seems to correlate with yours, for the coding problems I am working on. Claude doesn't seem to be so good if you stray away from writing http handlers (the modern web app stack in its various incarnations).
Curious to know what kind of problems you are talking about here
I don't want to give away too much due to anonymity reasons, but the problems are generally in the following areas (in order from hardest to easiest):
- One problem on using quantum mechanics and C*-algebra techniques for non-Markovian stochastic processes. The interchange between the physics and probability languages often trips the models up, so pretty much everything tends to fail here.
- Three problems in random matrix theory and free probability; these require strong combinatorial skills and a good understanding of novel definitions, requiring multiple papers for context.
- One problem in saddle-point approximation; I've just recently put together a manuscript for this one with a masters student, so it isn't trivial either, but does not require as much insight.
- One problem pertaining to bounds on integral probability metrics for time-series modelling.
It would be wonderful to have a deeper insight, but I understand that you can disclose your identity (I understand that you work in applied research field, right ? )
Seriously, why can't huge companies like OpenAI and Google produce documentation that is half this good??
https://api-docs.deepseek.com/guides/thinking_mode
No BS, just a concise description of exactly what I need to write my own agent.
I am very partial to Mistral's API docs https://docs.mistral.ai/api
It's because they're optimizing for a different problem.
Western Models are optimizing to be used as an interchangeable product. Chinese models are being optimizing to be built upon.
>Western Models are optimizing to be used as an interchangeable product.
But so much investment in their platforms, not just their APIs?
> Western Models are optimizing to be used as an interchangeable product
Why? It sounds like the stupidest idea ever. Interchangeability = no lock-in = no moot.
First you clone the API of the winner, because you want to siphon users from its install-base and offer de-risked switch over cost.
Now that you’re winning, others start cloning your API to siphon your users.
Now that you’re losing, you start cloning the current winner, who is probably a clone of your clone.
Highly competitive markets tend to normalize, because lock-in is a cost you can’t charge and remain competitive. The customer holds power here, not the supplier.
Thats also why everyone is trying to build into the less competitive spaces, where they could potentially moat. Tooling, certs, specialized training data, etc
Our (western) economic model forces competing individual companies to be profitable quickly. China can ignore DeepSeek losing money, because they know developing DeepSeek will help China. Not every institution needs to be profitable.
yes, they want to win the same way they won more or less every other economic competition in the last 30 years, scale out, drop prices and asphyxiate the competition.
Yeah, it’s an interesting one. I think inertia and expectations at this point? I don’t think the big labs anticipated how low the model switching costs would be and how quickly their leads would be eroded (by each other and the upstarts)
They are developing their moats with the platform tooling around it right now though. Look at Anthropic with Routines and OpenAI with Agents. Drop that capability in to a business with loose controls and suddenly you have a very sticky product with high switching costs. Meanwhile if you stick with purely the ‘chat’ use cases, even Cowork and scheduled tasks, you maintain portability.
They are all racing to AGI. They aren't designing them to be interchangeable they just happen to be.
No, they are not. If they were "racing to AGI" they would be working together. OpenAI would still be focused on being a non-profit. Anthropic wouldn't be blocking distillation on their models.
If by AGI you mean IPO, sure. I genuinely don't believe Dario nor Sam should be trusted at this point. Elon levels of overpromising and underdelivering.
If you want other people to know whether you're being genuine or sarcastic, you'll have to put a bit more effort into your comments. Your comment just adds noise.
What da?
Meanwhile, they don't actually say which model you are running on Deepseek Chat website.
You might enjoy Z.ais api docs aswell
Western orgs have been captured by Silicon Valley style patrimonialism, and aren’t based on merit anymore.
It's interesting that they mentioned in the release notes:
"Limited by the capacity of high-end computational resources, the current throughput of the Pro model remains constrained. We expect its pricing to decrease significantly once the Ascend 950 has been deployed into production."
https://api-docs.deepseek.com/zh-cn/news/news260424#api-%E8%...
>we implement end-to-end, bitwise batch-invariant, and deterministic kernels with minimal performance overhead
Pretty cool, I think they're the first to guarantee determinism with the fixed seed or at the temperature 0. Google came close but never guaranteed it AFAIK. DeepSeek show their roots - it may not strictly be a SotA model, but there's a ton of low-level optimizations nobody else pays attention to.
Deepseek v4 is basically that quiet kid in the back of the class who never says a word but casually ruins the grading curve for everyone else on the final exam.
> pricing "Pro" $3.48 / 1M output tokens vs $4.40
I’d like somebody to explain to me how the endless comments of "bleeding edge labs are subsidizing the inference at an insane rate" make sense in light of a humongous model like v4 pro being $4 per 1M. I’d bet even the subscriptions are profitable, much less the API prices.
edit: $1.74/M input $3.48/M output on OpenRouter
API prices may be profitable. Subscriptions may still be subsidized for power users. Free tiers almost certainly are. And frontier labs may be subsidizing overall business growth, training, product features, and peak capacity, even if a normal metered API call is profitable on marginal inference.
Research and training costs have to be amortized from somewhere; and labs are always training. I'm definitely keen for the financials when the two files for IPO though, it would be interesting to see; although I'm sure it won't be broken down much.
This price is high even because of the current shortage of inference cards available to DeepSeek; they claimed in their press release that once the Ascend 950 computing cards are launched in the second half of the year, the price of the Pro version will drop significantly
In six month deepseek won't be sota anymore und usage will be wayyyy down.
Well, if they distilled once…
They are profitable to opex costs, but not capex costs with the current depreciation schedules, though those are now edging higher than expected.
Amazingly, the current depreciation overestimates the retained value of GPUs.
In 2023, the depreciation schedule for H100s was 2 years, but they are still oversubscribed and generating signficant income.
Coreweve has upped their depreciation for GPUs to 6 years(!) now, which seems more realistic.
https://www.silicondata.com/blog/h100-rental-price-over-time
It's the decades of performance doesn't matter SV/web culture. I'd be surprised if over 1% of OpenAI/Anthropic staff know how any non-toy computer system works.
I was thinking the same. How can it be than other providers can offer third-party open source models with roughly the similar quality like this, Kimi K2.6 or GLM 5.1 for 10 times less the price? How can it be that GPT 5.5 is suddenly twice the price as GPT 5.4 while being faster? I don't believe that it's a bigger, more expensive model to run, it's just they're starting to raise up the prices because they can and their product is good (which is honest as long as they're transparent with it). Honestly the movement about subscription costing the company 20 times more than we're paying is just a PR movement to justify the price hike.
I'm pretty sure OpenAI and Anthropic are overpricing their token billed API usage mainly as an incentive to commit to get their subscriptions instead.
Anthropic recently dropped all inclusive use from new enterprise subscriptions, your seat sub gets you a seat with no usage. All usage is then charged at API rates. It’s like a worst of both worlds!
What's the point then? Special conditions for data retention/non-training policies?
SSO Tax is a large part of it, controls around plug-in marketplace, enforcement of config, observeability of spend. But it’s all pretty weak really for $20 a month.
And Microsoft are going the same route to moving Copilot Cowork over to a utilisation based billing model which is very unusual for their per seat products (I’m actually not sure I can ever remember that happening).
The target audience for the APIs is third party apps which are not compatible with the subscriptions.
True. I missed that.
My thoughts exactly. I also believe that subscription services are profitable, and the talk about subsidies is just a way to extract higher profit margins from the API prices businesses pay.
Google stated a while back, that with tpus they are able to sell at cost / with profit.
Aka: everyone who uses Nvidia isn't selling at cost, because Nvidia is so expensive.
And they actually say the prices will be "significantly" lower in second semester when Huawei 650 chips comes in.
I haven't seen anyone claiming that API prices are subsidized.
At some point (from the very beginning till ~2025Q4) Claude Code's usage limit was so generous that you can get roughly $10~20 (API-price-equivalent) worth of usage out of a $20/mo Pro plan each day (2 * 5h window) - and for good reason, because LLM agentic coding is extremely token-heavy, people simply wouldn't return to Claude Code for the second time if provided usage wasn't generous or every prompt costs you $1. And then Codex started trying to poach Claude Code users by offering even greater limits and constantly resetting everyone's limit in recent months. The API price would have to be 30x operating cost to make this not a subsidy. That would be an extraordinary claim.
The claim that APIs are subsidized is very common.
eg:
Token prices are significantly subsidized and anyone that does any serious work with AI can tell you this.
https://news.ycombinator.com/item?id=47684887
(the claims don't make any sense, but they are widely held)
Yeah, subscriptions used to be extraordinarily generous. I miss those days, but the reinvigoration of open weight models is super exciting.
I'm still playing with the new Qwen3.6 35B and impressed, now DeepSeek v4 drops; with both base and instruction-tuned weights? There goes my weekend :P
They’ve also announced Pro price will further drop 2H26 once they have more HUAWEI chips.
Insert always has been meme.
But seriously, it just stems from the fact some people want AI to go away. If you set your conclusion first, you can very easily derive any premise. AI must go away -> AI must be a bad business -> AI must be losing money.
Before the AI bubble that will burst any time now, there was the AI winter that would magically arrive before the models got good enough to rival humans.
Point taken but there isnt any western providers there yet. Power is cheaper in china.
These models are open and there are tons of western providers offering it at comparable rates.
As this is a new arch with tons of optimisations, it'll take some time for inference engines to support it properly, and we'll see more 3rd party providers offer it. Once that settles we'll have a median price for an optimised 1.6T model, and can "guesstimate" from there what the big labs can reasonably serve for the same price. But yeah, it's been said for a while that big labs are ok on API costs. The only unknown is if subscriptions were profitable or not. They've all been reducing the limits lately it seems.
Is there evidence that frontier models at anthropic, openai or google or whatnot are not using comparable optimizations to draw down their coats and that their markup is just higher because they can?
I mean, not one "bleeding edge" lab has stated they are profitable. They don't publish financials aside from revenue. And in Anthropic's case, they fuck with pricing every week. Clearly something is wrong here.
> I’d like somebody to explain to me how the endless comments of "bleeding edge labs are subsidizing the inference at an insane rate" make sense in light of a humongous model like v4 pro being $4 per 1M. I’d bet even the subscriptions are profitable, much less the API prices.
One answer - Chinese Communist Party. They are being subsidized by the state.
There's something heartwarming about the developer docs being released before the flashy press release.
Their audience is people who build stuff, techs audience is enterprise CEOs and politicians, and anyone else happy to hype up all the questionably timed releases and warnings of danger, white collar irrelevence, or promises of utopian paradise right before a funding round.
Insert obligatory "this is the way" Mando scene. Indeed!
Where's the training data and training scripts since you are calling this open source?
Edit: it seems "open source" was edited out of the parent comment.
They are exactly open source. The training data is the internet. Don't say it's on the internet. It IS the internet.
The training scripts are in Megatron and vLLM.
doesn't it get tiring after a while? using the same (perceived) gotcha, over and over again, for three years now?
no one is ever going to release their training data because it contains every copyrighted work in existence. everyone, even the hecking-wholesome safety-first Anthropic, is using copyrighted data without permission to train their models. there you go.
There is an easy fix already in widespread use: "open weights".
It is very much a valuable thing already, no need to taint it with wrong promise.
Though I disagree about being used if it was indeed open source: I might not do it inside my home lab today, but at least Qwen and DeepSeek would use and build on what eg. Facebook was doing with Llama, and they might be pushing the open weights model frontier forward faster.
Yeah, open weights is really good, especially when base models (not just the instruction tuned) weights are released like here.
Nvidia did with Nemo.
And they got sued :
https://www.reuters.com/technology/nvidia-is-sued-by-authors...
it's not a gotcha but people using words in ways others don't like.
It's not about likes, it's a flat out lie.
Aww yes, let me push a couple petabytes to my git repo for everyone to download...
An easier thing would be to say "open weights", yes.
Weights are the source, training data is the compiler.
You got it the wrong way round. It's more akin to.
1. Training data is the source. 2. Training is compilation/compression. 3. Weights are the compiled source akin to optimized assembly.
However it's an imperfect analogy on so many levels. Nitpick away.
It's dataset [0] released under some source available license or OSI license, ie. open dataset or open source dataset.
[0] https://news.ycombinator.com/item?id=47758408
So, this is the version that's able to serve inference from Huawei chips, although it was still trained on nVidia. So unless I'm very much mistaken this is the biggest and best model yet served on (sort of) readily-available chinese-native tech. Performance and stability will be interesting to see; openrouter currently saying about 1.12s and 30tps, which isn't wonderful but it's day one after all.
For reference, the huawei Ascend 950 that this thing runs on is supposed to be roughly comparable to nVidia's H100 from 2022. In other words, things are hotting up in the GPU war!
Can't see how NVIDA justifies its valuation/forward P/E ratio with these developments and on-device also becoming viable for 98% of people's needs when it comes to AI
On-device is incredibly far away from being viable. A $20 ChatGPT subscription beats the hell out of the 8B model that a normal computer can run.
Nvidia's forward PE ratio is only 20 for 2026. That's much lower than companies like Walmart and Costco. It's also growing nearly 100% YoY and has a $1 trillion backlog.
I think Nvidia is cheap.
I do think Nvidia isn't that badly priced; they still have the dominance in training and the proven execution
Biggest risk I see is Nvidia having delays / bad luck with R&D / meh generations for long enough to depress their growth projections; and then everything gets revalued.
Great! Can't wait to buy decent GPU for interference for <1k$
While SWE-bench Verified is not a perfect benchmark for coding, AFAIK, this is the first open-weights model that has crossed the threshold of 80% score on this by scoring 80.6%.
Back in Nov 2025, Opus 4.5 (80.9%) was the first proprietary model to do so.
SWE-bench Verified is, at this point, contaminated https://openai.com/index/why-we-no-longer-evaluate-swe-bench...
So it os hard to tell how much of a model gain is due to skill, and how much - overfitting.
For those who rely on open source models but don't want to stop using frontier models, how do you manage it? Do you pay any of the Chinese subscription plans? Do you pay the API directly? After GPT 5.5 release, however good it is, I am a bit tired of this price hiking and reduced quota every week. I am now unemployed and cannot afford more expensive plans for the moment.
Have you considered... not subscribing? You can ask the top models via chats for specific stuff, and then set up some free CLI like mistral.
If you're trying to make a buck while unemployed, sure get a subscription. Otherwise learn how to work again without AI, just focus on the interesting stuff.
I have $20 ChatGPT subscription. Stopped Anthropic $20 subscription since the limit ran out too fast. That's my frontier model(s).
For OSS model, I have z.ai yearly subscription during the promo. But it's a lot more expensive now. The model is good imo, and just need to find the right providers. There are a lot of alternatives now. Like I saw some good reviews regarding ollama cloud.
I’m deeply interested and invested in the field but I could really use a support group for people burnt out from trying to keep up with everything. I feel like we’ve already long since passed the point where we need AI to help us keep up with advancements in AI.
Don't keep up. Much like with news, you'll know when you need to know, because someone else will tell you first.
The players barely ever change. People don't have problems following sports, you shouldn't struggle so much with this once you accept top spot changes.
I didn't express this well but my interest isn't "who is in the top spot", and is more _why and _how various labs get the results they do. This is also magnified by the fact that I'm not only interested in hosted providers of inference but local models as well. What's your take on the best model to run for coding on 24GB of VRAM locally after the last few weeks of releases? Which harness do you prefer? What quants do you think are best? To use your sports metaphor it's more than following the national leagues but also following college and even high school leagues as well. And the real interest isn't even who's doing well but WHY, at each level.
The technical report discussing the why and how is here: https://huggingface.co/deepseek-ai/DeepSeek-V4-Pro/blob/main...
Follow the AI newsletters. They bundle the news along with their Op-Ed and summarize it better.
Can you suggest some good ones?
https://jack-clark.net/
It is funny seeing people ping pong between Anthropic and ChatGPT, with similar rhetoric in both directions.
At this point I would just pick the one who's "ethics" and user experience you prefer. The difference in performance between these releases has had no impact on the meaningful work one can do with them, unless perhaps they are on the fringes in some domain.
Personally I am trying out the open models cloud hosted, since I am not interested in being rug pulled by the big two providers. They have come a long way, and for all the work I actually trust to an LLM they seem to be sufficient.
I find ChatGPT annoying mostly
Open settings > personalization. Set it to efficient base style. Turn off enthusiasm and warmth. You’re welcome
It honestly has all kinda felt like more of the same ever since maybe GPT4?
New model comes out, has some nice benchmarks, but the subjective experience of actually using it stays the same. Nothing's really blown my mind since.
Feels like the field has stagnated to a point where only the enthusiasts care.
For coding Opus 4.5 in q3 2025 was still the best model I've used.
Since then it's just been a cycle of the old model being progressively lobotomised and a "new" one coming out that if you're lucky might be as good as the OG Opus 4.5 for a couple of weeks.
Subjective but as far as I can tell no progress in almost a year, which is a lifetime in 2022-25 LLM timelines
holy shit im right there with you
Already on Openrouter. Pro version is $1.74/m/input, $3.48/m/output, while flash $0.14/m/input, 0.28/m/output.
The Pro model is giving 429 Overload errors
Getting 'Api Error' here :( Every other model is working fine.
Try interacting with it through the website, it will give an error and some explanation on the issue. I had to relax my guardrail settings.
https://openrouter.ai/deepseek/deepseek-v4-pro
https://openrouter.ai/deepseek/deepseek-v4-flash
Its on OR - but currently not available on their anthropic endpoint. OR if you read this, pls enable it there! I am using kimi-2.6 with Claude Code, works well, but Deepseek V4 gives an error:
`https://openrouter.ai/api/messages with model=deepseek/deepseek-v4-pro, OR returns an error because their Anthropic-compat translator doesn't cover V4 yet. The Claude CLI dutifully surfaces that error as "model...does not exist"
Anyone worked out how much hardware one needs to self host this one?
For comparison on openrouter DeepSeek v4 Flash is slightly cheaper than Gemma 4 31b, more expensive than Gemma 4 26b, but it does support prompt caching, which means for some applications it will be the cheapest. Excited to see how it compares with Gemma 4.
I wonder why there aren't more open weights model with support for prompt caching on OpenRouter.
It is tricky to build good infrastructure for prompt caching.
Truly open source coming from China. This is heartwarming. I know if the potential ulterior motives.
American companies want a scan of your asshole for the privilege of paying to access their models, and unapologetically admit to storing, analyzing, training on, and freely giving your data to any authorities if requested. Chinese ulteriority is hypothetical, American is blatant.
It’s not remotely hypothetical you’d have to be living under a rock to believe that. And the fusion with a one-party state government that doesn’t tolerate huge swathes of thoughtspace being freely discussed is completely streamlined, not mediated by any guardrails or accountability.
This “no harm to me” meme about a foreign totalitarian government (with plenty of incentive to run influence ops on foreigners) hoovering your data is just so mind-bogglingly naive.
As a non-American, everything you wrote other than "one party" applies to the current US regime.
Relatively speaking, DeepSeek is less untrustworthy than Grok.
When I try ChatGPT on current events from the White House it interprets them as strange hypotheticals rather than news, which is probably more a problem with DC than with GPT, but whatever.
> And the fusion with a one-party state government that doesn’t tolerate huge swathes of thoughtspace being freely discussed
That would be a great argument if the American models weren’t so heavily censored.
The Chinese model might dodge a question if I ask it about 1-2 specific Chinese cultural issues but then it also doesn’t moralize me at every turn because I asked it to use a piece of security software.
The USA has one of the highest percentages of their population in prison.
Even for minor stuff like beeing addicted to drugs.
Looks pretty totalitarian to me.
And in China the state can harvest your organs for political crimes or even just being the wrong religion.
Not quite the same.
I think you're going to need to provide sources for such an outrageous and unbelievable claim.
I was curious as this is something commonly mentioned in all sorts of western media.
Quick google top link
https://en.wikipedia.org/wiki/Forced_organ_harvesting_from_F...
I’ll be sure to pick up my copy of the peoples daily to read about those statistics in the morning.
Do you really trust China’s stats on prison population?
Note: you can have this conversation criticizing the US on a US website. Try criticizing Xi or the CCP or calling him Pooh on a Chinese website.
You think China doesn’t imprison drug users?
China recently executed a low level drug trafficker
https://www.lemonde.fr/en/international/article/2026/04/05/c...
China is one of the top executioners. China executes more than rest of the world combined
https://www.amnesty.org/en/latest/news/2017/04/china-must-co...
You think China is honest about political prisoners in Tibet and Xinjiang?
Criticize the US all you want but I can’t understand the whitewashing of a real totalitarian and genocidal state like mainland China.
>This “no harm to me” meme about a foreign totalitarian government (with plenty of incentive to run influence ops on foreigners) hoovering your data is just so mind-bogglingly naive.
yes, this is exactly what I'm saying.
It’s an open model? So you can run it yourself if you want to
The oppression of people in China like Uyghurs and Hong Kong, the complete lack of free speech, the saber-rattling at neighbours, and the lack of respect for intellectual property are indeed all well documented.
But for folks on the opposite side of the world, the threats are more like "they're selling us electric cars and solar panels too cheaply" and the hypothetical "these super cheap CCTV cameras could be used for remote spying"
> This “no harm to me” meme about a foreign totalitarian government (with plenty of incentive to run influence ops on foreigners) hoovering your data is just so mind-bogglingly naive.
This is why I’ve been urging everyone I know to move away from American based services and providers. It’s slow but honest work.
And you're saying Americans aren't banned from criticising their elites?
Donald trump is a terrible president and looks like Winnie the Pooh. Keir Starmer is useless and a liar.
Feel free to go post similar on Chinese social media about their leaders.
Pretty sure you guys have a strong laws about free-speech, and criticizing elites is part of that. Though there are some groups that do not really want the 1st amendment to be a thing.
> Though there are some groups that do not really want the 1st amendment to be a thing.
The executive branch?
That would be a naïve perspective.
Foreigners are literally being denied entry into the country due to opposing viewpoints expressed on social media. People have to disable FaceID on their phones prior to going through customs in case an agent decides to investigate whether their political views are in opposition to the current administration.
> And you're saying Americans aren't banned from criticising their elites?
Half the country would be locked up right now if they weren’t allowed to criticize Trump. Have you even paid attention to how much he’s shitted on, on a daily basis?
As someone with Tibetan friends and as someone from India, Chinese ulterior motives are way more clear.
Same as USA. Happy to see some competition.
It's a little sad that tech now comes down to geopolitics, but if you're not in the USA then what is the difference? I'm Danish, would I rather give my data to China or to a country which recently threatened the kingdom I live in with military invasion? Ideally I'd give them to Mistral, but in reality we're probably going to continue building multi-model tools to make sure we share our data with everyone equally.
I don’t care about whatever “ulterior motives” they might have
My country’s per capita income is $2500 a year. We can’t pay perpetual rent to OAI/Anthropic
Same
if you want to understand why labs open source their models: http://try.works/why-chinese-ai-labs-went-open-and-will-rema...
> Internet comments say that open sourcing is a national strategy, a loss maker subsidized by the government. On the contrary, it is a commercial strategy and the best strategy available in this industry.
This sounds whole lot like potatoh potahto. I think the former argument is very much the correct one: China can undercut everyone and win, even at a loss. Happened with solar panels, steel, evs, sea food - it's a well tested strategy and it works really well despite the many flavors it comes in.
That being said a job well done for the wrong reasons is still a job well done so we should very much welcome these contributions, and maybe it's good to upset western big tech a bit so it's remains competitive.
It is not only that Chinese labs can undercut on price. It is that they must. They must give away their models for free by open sourcing them, and they must even give away free inference services for people to try them. That is the point of the post.
There is not ‘must’ here, they did not ‘have’ to undercut every other strategically and technologically important industry the rest of the world has, but they did as a point of national policy.
American industry has been on a downward spiral since the early 1960s….
Do they also open-source censoring filter rules? Like, you can't ask what happened at Tiananmen Square in 1989.
Open weight!
Please don't slander the most open AI company in the world. Even more open than some non-profit labs from universities. DeepSeek is famous for publishing everything. They might take a bit to publish source code but it's almost always there. And their papers are extremely pro-social to help the broader open AI community. This is why they struggle getting funded because investors hate openness. And in China they struggle against the political and hiring power of the big tech companies.
Just this week they published a serious foundational library for LLMs https://github.com/deepseek-ai/TileKernels
Others worth mentioning:
https://github.com/deepseek-ai/DeepGEMM a competitive foundational library
https://github.com/deepseek-ai/Engram
https://github.com/deepseek-ai/DeepSeek-V3
https://github.com/deepseek-ai/DeepSeek-R1
https://github.com/deepseek-ai/DeepSeek-OCR-2
They have 33 repos and counting: https://github.com/orgs/deepseek-ai/repositories?type=all
And DeepSeek often has very cool new approaches to AI copied by the rest. Many others copied their tech. And some of those have 10x or 100x the GPU training budget and that's their moat to stay competitive.
The models from Chinese Big Tech and some of the small ones are open weights only. (and allegedly benchmaxxed) (see https://xcancel.com/N8Programs/status/2044408755790508113). Not the same.
DeepSeek's models are indeed open weight. Why do you feel that pointing this out would be considered slander?
It’s not slander to say something true. These are open weights, not open source. They don’t provide the training data or the methodology requires to reproduce these weights.
So you can’t see what facts are pruned out, what biases were applied, etc. Even more importantly, you can’t make a slightly improved version.
This model is as open source as a windows XP installation ISO.
> These are open weights, not open source.
Did you even read my comment?
Weights are the source, training data is the compiler
Training data == source code, training algorithm == compiler, model weights == compiled binary.
Training algorithm is the programmer, weights are the code that you run in an interpreter
isn't it more like the data is the source, the training process is the compiler, and the weights are the binary output.
> I know if the potential ulterior motives.
And you think the US tech giants don't have any ulterior motives?!
I think their motives are pretty transparent, as are china’s, as ever, you have to pick the lesser of two evils.
Weights available here: https://huggingface.co/deepseek-ai/DeepSeek-V4-Pro
https://huggingface.co/deepseek-ai/DeepSeek-V4-Flash-Base https://huggingface.co/deepseek-ai/DeepSeek-V4-Pro-Base
And we got new base models, wonderful, truly wonderful
https://huggingface.co/deepseek-ai/DeepSeek-V4-Pro/blob/main...
Model was released and it's amazing. Frontier level (better than Opus 4.6) at a fraction of the cost.
I don't think we need to compare models to Opus anymore. Opus users don't care about other models, as they're convinced Opus will be better forever. And non-Opus users don't want the expense, lock-in or limits.
As a non-Opus user, I'll continue to use the cheapest fastest models that get my job done, which (for me anyway) is still MiniMax M2.5. I occasionally try a newer, more expensive model, and I get the same results. I have a feeling we might all be getting swindled by the whole AI industry with benchmarks that just make it look like everything's improving.
Which model's best depends on how you use it. There's a huge difference in behaviour between Claude and GPT and other models which makes some poor substitutes for others in certain use cases. I think the GPT models are a bad substitute for Claude ones for tasks such as pair-programming (where you want to see the CoT and have immediate responses) and writing code that you actually want to read and edit yourself, as opposed to just letting GPT run in the background to produce working code that you won't inspect. Yes, GPT 5.4 is cheap and brilliant but very black-box and often very slow IME. GPT-5.4 still seems to behave the same as 5.1, which includes problems like: doesn't show useful thoughts, can think for half an hour, says "Preparing the patch now" then thinks for another 20 min, gives no impression of what it's doing, reads microscopic parts of source files and misses context, will do anything to pass the tests including patching libraries...
Agree with your assessment, I think after models reached around Opus 4.5 level, its been almost indistinguishable for most tasks. Intelligence has been commoditized, what's important now is the workflows, prompting, and context management. And that is unique to each model.
Same for me. There are tasks when I want the smartest model. But for a whole lot of tasks I now default to Sonnet, or go with cheaper models like GLM, Kimi, Qwen. DeepSeek hasn't been in the mix for a while because their previous model had started lagging, but will definitely test this one again.
The tricky part is that the "number of tokens to good result" does absolutely vary, and you need a decent harness to make it work without too much manual intervention, so figuring out which model is most cost-effective for which tasks is becoming increasingly hard, but several are cost-effective enough.
This is not true for some cases e.g. there are stark differences in the correctness of answers in certain type of case work.
Is Opus nerfed somehow in Copilot? Ive tried it numerous times, it has never reallt woved me. They seem to have awfully small context windows, but still. Its mostly their reasoning which has been off
Codex is just so much better, or the genera GPT models.
Opus just got killed in Copilot. I always found it great, FWIW.
https://github.blog/news-insights/company-news/changes-to-gi...
I found Opus 4.7 to be actually worse than Opus 4.6 for my use case
Substantially worse at following instructions and overoptimized for maximizing token usage
This resonates with me a lot.
I do some stuff with gemini flash and Aider, but mostly because I want to avoid locking myself into a walled garden of models, UIs and company
What do you run these on? I've gotten comfortable with Claude but if folks are getting Opus performance for cheaper I'll switch.
You can just use Claude Code with a few env vars, most of these providers offer an Anthropic compatible API
Try Charm Crush first, it's a native binary. If it's unbearable, try opencode, just with the knowledge your system will probably be pwned soon since it's JS + NPM + vibe coding + some of the most insufferable devs in the industry behind that product.
If you're feeling frisky, Zed has a decent agent harness and a very good editor.
actually this is not the reason - the harness is significantly better. There is no comparable harness to Claude Code with skills, etc.
Opencode was getting there, but it seems the founders lost interest. Pi could be it, but its very focused on OpenClaw. Even Codex cli doesnt have all of it.
which harness works well with Deepseek v4 ?
What's the issue with OC? I tried it a bit over 2 months ago, when I was still on Claude API, and it actually liked more that CC (i.e. the right sidebar with the plan and a tendency at asking less "security" questions that CC). Why is it so bad nowadays?
eh idk. until yesterday opus was the one that got spatial reasoning right (had to do some head pose stuff, neither glm 5.1 nor codex 5.3 could "get" it) and codex 5.3 was my champion at making UX work.
So while I agree mixed model is the way to go, opus is still my workhorse.
How does it compare to Opus 4.7? I've been immersed in 4.7 all week participating in the Anthropic Opus 4.7 hackathon and it's pretty impressive even if it's ravenous from a token perspective compared to 4.6
The thing is, it doesnt need to beat 4.7. it just needs to do somewhat well against it.
This is free... as in you can download it, run it on your systems and finetune it to be the way you want it to be.
> you can download it, run it on your systems
In theory, sure, but as other have pointed out you need to spend half a million on GPUs just to get enough VRAM to fit a single instance of the model. And you’d better make sure your use case makes full 24/7 use of all that rapidly-depreciating hardware you just spent all your money on, otherwise your actual cost per token will be much higher than you think.
In practice you will get better value from just buying tokens from a third party whose business is hosting open weight models as efficiently as possible and who make full use of their hardware. Even with the small margin they charge on top you will still come out ahead.
There are a lot of companies who would gladly drop half a million on a GPU to have private inference that Anthropic or OpenAI can’t use to steal their data.
And that GPU wouldn’t run one instance, the models are highly parallelizable. It would likely support 10-15 users at once, if a company oversubscribed 10:1 that GPU supports ~100 seats. Amortized over a couple years the costs are competitive.
> There are a lot of companies who would gladly drop half a million on a GPU to have private inference that Anthropic or OpenAI can’t use to steal their data.
Obviously, and certainly companies do run their own models because they place some value on data sovereignty for regulatory or compliance or other reasons. (Although the framing that Anthropic or OpenAI might "steal their data" is a bit alarmist - plenty of companies, including some with _highly_ sensitive data, have contracts with Anthropic or OpenAI that say they can't train future models on the data they send them and are perfectly happy to send data to Claude. You may think they're stupid to do that, but that's just your opinion.)
> the models are highly parallelizable. It would likely support 10-15 users at once.
Yes, I know that; I understand LLM internals pretty well. One instance of the model in the sense of one set of weights loaded across X number of GPUs; of course you can then run batch inference on those weights, up to the limits of GPU bandwidth and compute.
But are those 100 users you have on your own GPUs usings the GPUs evenly across the 24 hours of the day, or are they only using them during 9-5 in some timezone? If so, you're leaving your expensive hardware idle for 2/3 of the day and the third party providers hosting open weight models will still beat you on costs, even without getting into other factors like they bought their GPUs cheaper than you did. Do the math if you don't believe me.
Sure, but that’s an incredibly short term viewpoint.
Do you think a lot of people have “systems” to run a 1.6T model?
To me, the important thing isn't that I can run it, it's that I can pay someone else to run it. I'm finding Opus 4.7 seems to be weirdly broken compared to 4.6, it just doesn't understand my code, breaks it whenever I ask it to do anything.
Now, at the moment, i can still use 4.6 but eventually Anthropic are going to remove it, and when it's gone it will be gone forever. I'm planning on trying Deepseek v4, because even if it's not quite as good, I know that it will be available forever, I'll always be able to find someone to run it.
No, but businesses do. Being able to run quality LLMs without your business, or business's private information, being held at the mercy of another corp has a lot of value.
What type of system is needed to self host this? How much would it cost?
Depends how many users you have and what is "production grade" for you but like 500k gets you a 8x B200 machine.
Depends on fast you want it to be. I’m guessing a couple of $10k mac studio boxes could run it, but probably not fast enough to enjoy using it.
One GB200 NVL72 from Nvidia would do it. $2-3 million, or so. If you're a corporation, say Walmart or PayPal, that's not out of the question.
If you want to go budget corporate, 7 x H200 is just barely going to run it, but all in, $300k ought to do it.
How many users can you serve with that?
For the H200, between 150-700. The GB200 gets you something like 2-10k users.
$20K worth of RTX 6000 Blackwell cards should let you run the Flash version of the model.
Not really - on prem llm hosting is extremely labor and capital intensive
But can be, and is, done. I work for a bootstrapped startup that hosts a DeepSeek v3 retrain on our own GPUs. We are highly profitable. We're certainly not the only ones in the space, as I'm personally aware of several other startups hosting their own GLM or DeepSeek models.
Why a retrain? What are you using the model for?
Completely agree, not suggesting it needs ot just genuinely curious. Love that it can be run locally though. Open source LLMs punching back pretty hard against proprietary ones in the cloud lately in terms of performance.
What's the hardware cost to running it?
Probably like 100 USD/hour
I was curious, and some [intrepid soul](https://wavespeed.ai/blog/posts/deepseek-v4-gpu-vram-require...) did an analysis. Assuming you do everything perfectly and take full advantage of the model's MoE sparsity, it would take:
- To run at full precision: "16–24 H100s", giving us ~$400-600k upfront, or $8-12/h from [us-east-1](https://intuitionlabs.ai/articles/h100-rental-prices-cloud-c...).
- To run with "heavy quantization" (16 bits -> 8): "8xH100", giving us $200K upfront and $4/h.
- To run truly "locally"--i.e. in a house instead of a data center--you'd need four 4090s, one of the most powerful consumer GPUs available. Even that would clock in around $15k for the cards alone and ~$0.22/h for the electricity (in the US).
Truly an insane industry. This is a good reminder of why datacenter capex from since 2023 has eclipsed the Manhattan Project, the Apollo program, and the US interstate system combined...
All these number are peanuts to a mid sized company. A place I worked at used to spend a couple million just for a support contract on a Netapp.
10 years from now that hardware will be on eBay for any geek with a couple thousand dollars and enough power to run it.
That article is a total hallucination.
"671B total / 37B active"
"Full precision (BF16)"
And they claim they ran this non-existent model on vLLM and SGLang over a month and a half ago.
It's clickbait keyword slop filled in with V3 specs. Most of the web is slop like this now. Sigh.
"if you have to ask..."
... if you have 800 GB of VRAM free.
I remember reading about some new frameworks have been coming out to allow Macs to stream weights of huge models live from fast SSDs and produce quality output, albeit slowly. Apart from that...good luck finding that much available VRAM haha
Tbh I was more productive with 4.6 than ever before and if AI progress locks in permanently at 4.6 tier, I’d be pretty happy
It is more than good enough and has effectively caught up with Opus 4.6 and GPT 5.4 according to the benchmarks.
It's about 2 months behind GPT 5.5 and Opus 4.7.
As long as it is cheap to run for the hosting providers and it is frontier level, it is a very competitive model and impressive against the others. I give it 2 years maximum for consumer hardware to run models that are 500B - 800B quantized on their machines.
It should be obvious now why Anthropic really doesn't want you to run local models on your machine.
Vibes > Benchmarks. And it's all so task-specific. Gemini 3 has scored very well in benchmarks for very long but is poor at agentic usecases. A lot of people prefering Opus 4.6 to 4.7 for coding despite benchmarks, much more than I've seen before (4.5->4.6, 4->4.5).
Doesn't mean Deepseek v4 isn't great, just benchmarks alone aren't enough to tell.
With the ability of the Qwen3.6 27B, I think in 2 years consumers will be running models of this capability on current hardware.
What's going to change in 2 years that would allow users to run 500B-800B parameter models on consumer hardware?
I think its just an estimate
But the question remains
No, the Deepseek V4 paper itself says that DS-V4-Pro-Max is close to Opus 4.5 in their staff evaluations, not better than 4.6:
> In our internal evaluation, DeepSeek-V4-Pro-Max outperforms Claude Sonnet 4.5 and approaches the level of Opus 4.5.
Is it honestly better than Opus 4.6 or just benchmaxxed? Have you done any coding with an agent harness using it?
If its coding abilities are better than Claude Code with Opus 4.6 then I will definitely be switching to this model.
Apparently glm5.1 and qwen coder latest is as good as opus 4.6 on benchmarks. So I tried both seriously for a week (glm Pro using CC) and qwen using qwen companion. Thought I could save $80 a month. Unfortunately after 2 days I had switched back to Max. The speed (slower on both although qwen is much faster) and errors (stupid layout mistakes, inserting 2 footers then refusing to remove one, not seeing obvious problems in screenshots & major f-ups of functionality), not being able to view URLs properly, etc. I'll give deepseek a go but I suspect it will be similar. The model is only half the story. Also been testing gpt5.4 with codex and it is very almost as good as CC... better on long running tasks running in background. Not keen on ChatGPT codex 'personality' so will stick to CC for the most part.
Their Chinese announcement says that, based on internal employee testing, it is not as good as Opus 4.6 Thinking, but is slightly better than Opus 4.6 without Thinking enabled.
I appreciate this, makes me trust it more than benchmarks.
In case people wonder where the announcement is (you can easily translate it via browser if you don't read Chinese): https://mp.weixin.qq.com/s/8bxXqS2R8Fx5-1TLDBiEDg
It's still a "preview" version atm.
That's super interesting, isn't Deepseek in China banned from using Anthropic models? Yet here they're comparing it in terms of internal employee testing.
They use VPN to access. Even Google Deepmind uses Anthropic. There was a fight within Google as to why only DeepMind is allowed to Claude while rest of the Google can't.
> (better than Opus 4.6)
There we go again :) It seems we have a release each day claiming that. What's weird is that even deepseek doesn't claim it's better than opus w/ thinking. No idea why you'd say that but anyway.
Dsv3 was a good model. Not benchmaxxed at all, it was pretty stable where it was. Did well on tasks that were ood for benchmarks, even if it was behind SotA.
This seems to be similar. Behind SotA, but not by much, and at a much lower price. The big one is being served (by ds themselves now, more providers will come and we'll see the median price) at 1.74$ in / 3.48$ out / 0.14$ cache. Really cheap for what it offers.
The small one is at 0.14$ in / 0.28$ out / 0.028$ cache, which is pretty much "too cheap to matter". This will be what people can run realistically "at home", and should be a contender for things like haiku/gemini-flash, if it can deliver at those levels.
Anthropic fans would claim God itself is behind Opus by 3-6 months and then willingly be abused by Boris and one of his gaslighting tweets.
LMAO
> Anthropic fans ...
I have no idea why you'd think that, but this is straight from their announcement here (https://mp.weixin.qq.com/s/8bxXqS2R8Fx5-1TLDBiEDg):
> According to evaluation feedback, its user experience is better than Sonnet 4.5, and its delivery quality is close to Opus 4.6's non-thinking mode, but there is still a certain gap compared to Opus 4.6's thinking mode.
This is the model creators saying it, not me.
For the curious, I did some napkin math on their posted benchmarks and it racks up 20.1 percentage point difference across the 20 metrics where both were scored, for an average improvement of about 2% (non-pp). I really can't decide if that's mind blowing or boring?
Claude4.6 was almost 10pp better at at answering questions from long contexts ("corpuses" in CorpusQA and "multiround conversations" in MRCR), while DSv4 was a staggering 14pp better at one math challenge (IMOAnswerBench) and 12pp better at basic Q&A (SimpleQA-Verified).
FWIW it's also like 10x cheaper.
The dragon awakes yet again!
There appears a flight of dragons without heads. Good fortune.
That's literally what the I Ching calls "good fortune."
Competition, when no single dragon monopolizes the sky, brings fortune for all.
Pop?
On a seperate note, I am guessing that all the new models have announced in the space of a few days because the time to train a model is the same for each AI company.
Which strikes me as odd - Inwoukd have assumed someone had an edge in terms of at least 10% extra GPUs.
But why would they all start at the same time?
Are there better providers for inferencing this right now? I know it's launch day, but openrouter showing 30tps isn't looking great.
They released 1.6 T pro base model on huggingface. First time I'm seeing a "T" model here.
Kimi K2.5 and K2.6 are both >1T
The Flash version is 284B A13B in mixed FP8 / FP4 and the full native precision weights total approximately 154 GB. KV cache is said to take 10% as much space as V3. This looks very accessible for people running "large" local models. It's a nice follow up to the Gemma 4 and Qwen3.5 small local models.
Price is appealing to me. I have been using gemini 3 flash mainly for chat. I may give it a try.
input: $0.14/$0.28 (whereas gemini $0.5/$3)
Does anyone know why output prices have such a big gap?
Output is what the compute is used for above all else; costs more hardware time basically than prompt processing (input) which is a lot faster
input tokens are processed at 10-50 times the speed of output tokens since you can process then in batches and not one at a time like output tokens
Just tested it via openrounter in the Pi Coding agent and it regularly fails to use the read and write tool correctly, very disappointing. Anyone know a fix besides prompting "always use the provided tools instead of writing your own call"
FWIW, works great in Claude Code.
https://api-docs.deepseek.com/guides/coding_agents#integrate...
They have just released it, give it some time, they probably haven't pretested it with Pi
How can they fix it after the release? They would have to retrain/finetune it further, no?
It's only in preview right now. And anyway, yes, models regularly get updated training.
But in this case, it's more likely just to be a tooling issue.
This is shockingly cheap for a near frontier model. This is insane.
For context, for an agent we're working on, we're using 5-mini, which is $2/1m tokens. This is $0.30/1m tokens. And it's Opus 4.6 level - this can't be real.
I am uncomfortable about sending user data which may contain PII to their servers in China so I won't be using this as appealing as it sounds. I need this to come to a US-hosted environment at an equivalent price.
Hosting this on my own + renting GPUs is much more expensive than DeepSeek's quoted price, so not an option.
> I am uncomfortable about sending user data which may contain PII to their servers in China
As a European I feel deeply uncomfortable about sending data to US companies where I know for sure that the government has access to it.
I also feel uncomfortable sending it to China.
If you'd asked me ten years ago which one made me more uncomfortable. China.
But now I'm not so sure, in fact I'm starting to lean towards the US as being the major risk.
Right now Im much more worried about sending data to the US and A.. At least theres a less chanse it will be missused against -me-
> For context, for an agent we're working on, we're using 5-mini, which is $2/1m tokens. This is $0.30/1m tokens. And it's Opus 4.6 level - this can't be real.
It's doesn't seem all that out there compared to the other Chinese model price/performance? Kimi2.6 is cheaper even than this, and is pretty close in performance
Kimi is indeed somewhat cheap for frontier-level intelligence, but still is $4-5 per mm tokens. Deep Seek is at least an order of magnitude cheaper.
865 GB: I am going to need a bigger GPU.
Or several bigger GPUs! :)
I like this. The more competitors there are, the more we the users benefit.
I know people don't like Twitter links here but the main link just goes to their main docs site generic 'getting started' page.
The website now has a link to the announcement on Twitter here https://x.com/deepseek_ai/status/2047516922263285776
Copying text of that below
DeepSeek-V4 Preview is officially live & open-sourced! Welcome to the era of cost-effective 1M context length.
DeepSeek-V4-Pro: 1.6T total / 49B active params. Performance rivaling the world's top closed-source models.
DeepSeek-V4-Flash: 284B total / 13B active params. Your fast, efficient, and economical choice.
Try it now at http://chat.deepseek.com via Expert Mode / Instant Mode. API is updated & available today!
Tech Report: https://huggingface.co/deepseek-ai/DeepSeek-V4-Pro/blob/main...
Open Weights: https://huggingface.co/collections/deepseek-ai/deepseek-v4
Just use xcancel by adding 'cancel' to the link
https://xcancel.com/deepseek_ai/status/2047516922263285776
I like the pelican I got out of deepseek-v4-flash more than the one I got from deepseek-v4-pro.
https://simonwillison.net/2026/Apr/24/deepseek-v4/
Both generated using OpenRouter.
For comparison, here's what I got from DeepSeek 3.2 back in December: https://simonwillison.net/2025/Dec/1/deepseek-v32/
And DeepSeek 3.1 in August: https://simonwillison.net/2025/Aug/22/deepseek-31/
And DeepSeek v3-0324 in March last year: https://simonwillison.net/2025/Mar/24/deepseek/
This is just a random thought, but have you tried doing an 'agentic' pelican?
As in have the model consider its generated SVG, and gradually refine it, using its knowledge of the relative positions and proportions of the shapes generated, and have it spin for a while, and hopefully the end result will be better than just oneshotting it.
Or maybe going even one step further - most modern models have tool use and image recognition capabilities - what if you have it generate an SVG (or parts/layers of it, as per the model's discretion) and feed it back to itself via image recognition, and then improve on the result.
I think it'd be interesting to see, as for a lot of models, their oneshot capability in coding is not necessarily corellated with their in-harness ability, the latter which really matters.
I tried that for the GPT-5 launch - a self-improving loop that renders the SVG, looks at it and tries again - and the results were surprisingly disappointing.
I should try it again with the more recent models.
No way. The Pro pelican is fatter, has a customized front fork, and the sun is shining! He’s definitely living the best life.
The pro pelican is a work of art! It goes dimensions that no other LLM has gone before.
yeah. look at these 4 feathers (?) on his bum too.
a lot of dumplings
The Flash one is pretty impressive. Might be my favorite so far in the pelican-riding-a-bicycle series
DeepSeek pelicans are the angriest pelicans I’ve seen so far.
they're just late for work.
996 Pelican, lol
Being a bicycle geometry nerd I always look at the bicycle first.
Let me tell you how much the Pro one sucks... It looks like failed Pedersen[1]. The rear wheel intersects with the bottom bracket, so it wouldn't even roll. Or rather, this bike couldn't exist.
The flash one looks surprisingly correct with some wild fork offset and the slackest of seat tubes. It's got some lowrider[2] aspirations with the small wheels, but with longer, Rivendellish[3], chainstays. The seat post has different angle than the seat tube, so good luck lowering that.
[1] https://en.wikipedia.org/wiki/Pedersen_bicycle
[2] https://en.wikipedia.org/wiki/Lowrider_bicycle
[3] https://www.rivbike.com/
This is an excellent comment. Thanks for this - I've only ever thought about whether the frame is the right shape, I never thought about how different illustrations might map to different bicycle categories.
Some other reactions:
I wonder which model will try some more common spoke lacing patterns. Right now there seems to be a preference for radial lacing, which is not super common (but simple to draw). The Flash and Pro one uses 16 spoke rims, which actually exist[1] but are not super common.
The Pro model fails badly at the spokes. Heck, the spokes sit on the outside of the drive side of the rim and tire. Have a nice ride riding on the spokes (instead of the tire) welded to the side of your rim.
Both bikes have the drive side on the left, which is very very uncommon. That can't exist in the training data.
[1] https://cicli-berlinetta.com/product/campagnolo-shamal-16-sp...
The Pedersen looks like someone failed the "draw a bicycle" test and decided to adjust the universe.
I think the pelican on a bike is known widely enough that of seizes to be useful as a benchmark. There is even a pelican briefly appearing in the promo video of GPT-5, if I'm not mistaken https://openai.com/gpt-5/. So the companies are apparently aware of it.
It was a bigger deal in the Gemini 3.1 launch: https://x.com/JeffDean/status/2024525132266688757
To me this is the perfect proof that
1) LLM is not AGI. Because surely if AGI it would imply that pro would do better than flash?
2) and because of the above, Pelican example is most likely already being benchmaxxed.
What was your prompt for the image? Apologies if this should be obvious.
>Generate an SVG of a pelican riding a bicycle
at the top of the linked pages.
Is it then Deepseek hosted by Deepseek?
How much does the drawing change if you ask it again?
I really like the pro version. The pelican is so cute.
Where is the GPT 5.5 Pelican?
https://news.ycombinator.com/item?id=47879092#47880421
In the 5.5 topic
Why they so angry?
This should not be the top comment on every model release post. It's getting tiring.
This should be the bottom comment on the pelican comment on every model release post.
Clearly the top comment should be "Imagine a beowulf cluster of Deepseek v4!"
My mother was murdered by Beowulf, you insensitive Claude!
This was perfect.
At this point 'frontier model release' is a monthly cadence, Kimi 2.6 Claude 4.6 GPT 5.5, the interesting question is which evals will still be meaningful in 6 months.
more like weekly or almost daily, gpt 5.5 was literally 12 hours ago
MMLU-Pro:
Gemini-3.1-Pro at 91.0
Opus-4.6 at 89.1
GPT-5.4, Kimi2.6, and DS-V4-Pro tied at 87.5
Pretty impressive
Funny how Gemini is theoretically the best -- but in practice all the bugs in the interface mean I don't want to use it anymore. The worst is it forgets context (and lies about it), but it's very unreliable at reading pdfs (and lies about it). There's also no branch, so once the context is lost/polluted, you have to start projects over and build up the context from scratch again.
The sheer number of bugs and lack of meaningful improvements in Google products is a clear counterargument to the AI bull thesis
If AI was so good at coding, why can’t it actually make a usable Gemini/AI Studio app?
I think Google might just be institutionally incapable of making good UX
Yeah if I could use Gemini with pi.dev that would be my choice. But Gemini CLI is just so, so bad.
I gave up on Gemini 3.1 Pro in VSCode after 2 hours. They fully refunded me.
Feels like the real story here is cost/performance tradeoff rather than raw capability. Benchmarks keep moving incrementally, but efficiency gains like this actually change who can afford to build on top.
What's the current best framework to have a 'claude code' like experience with Deepseek (or in general, an open-source model), if I wanted to play?
https://pi.dev/
https://opencode.ai/
You can use deepseek with Claude code
You can, but does it work well? I assume CC has all kinds of Claude specific prompts in it, wouldn't you be better with a harness designed to be model agnostic like pi.dev or OpenCode?
I've been using all Kimi K2.6, gpt-5.4 and now Deepseek v4 (thought not extensively yet) in Claude Code and I can say it works much better than you'd expect. It looks like the system prompt and tools are pulling a lot of weight. Maybe the current models are good enough that you don't need them to be trained for a specific harness.
You can use CC with other models, you aren’t forced to use Claude model.
claude-code-cli/opencode/codex
I don't mind that High Flyer completely ripped off Anthropic to do this so much as I mind that they very obviously waited long enough for the GAB to add several dozen xz-level easter eggs to it.
It is great! I asked the question what I always ask of new models ("what would Ian M Banks think about the current state of AI") and it gave me a brilliant answer! Funny enough the answer contained multiple criticisms of his own creators ("Chinese state entities", "Social Credit System").
Is there a harness that is as good as cloud code that can be used with open weight models?
I prefer OpenCode over Claude Code, and it works with basically everything. Give it a try. ymmv
I've liked Hermes agent, but never used Claude code so don't know how it compares
Try pi coding agent!
Never used Claude myself but there are agents that can use local model. I.e. - Jetbrains Junie - Mistral Vibe
For those who didn't check the page yet, it just links to the API docs being updated with the upcoming models, not the actual model release.
Weights are on Huggingface FWIW. https://huggingface.co/deepseek-ai/DeepSeek-V4-Pro/tree/main
My submission here https://news.ycombinator.com/item?id=47885014 done at the same time was to the weights.
dang, probably the two should be merged and that be the link
there's no pinging. Someone's gotta email dang
Oh well, I should have bought 2x 512GB RAM MacStudios, not just one :(
Excited that the long awaited v4 is finally out. But feel sad that it's not multimodal native.
Such different time now than early 2025 when people thought Deepaeek was going to kill the market for Nvidia.
Actually the fact the inference of a SOTA model is completely Nvidia-free is the biggest attack to Nvidia every carried so far. Even American frontier AI labs may start to buy Chinese hardware if they need to continue the AI race, they can't keep paying so much money for the GPUs, especially once Huawei training versions of their GPUs will ship.
They might still kill the market for NVIDIA, if future releases prioritize Huawei chips
Already over a billion tokens on open router in under 5 hours
Is V4 still not a multi-modal model?
Not yet... Which is a shame.
Looking forward to DeepSeek Coding Plan
I came here to say the same :) !
MErge? https://news.ycombinator.com/item?id=47885014
Is there a Quantized version of this?
They have released mixed fp8/fp4 for efficiency. It's still hundreds of gigabytes, though. Give up on local for these.
This FLash model might be affordable for OpenClaw. I run it on my mac 48gb ram now but it's slowish.
So is this the first AI lab using MUON for their frontier model?
No, Muon was developed by Moonshot; they've been using it in their Kimi models since Kimi K2 in 2025.
A few hours after GPT5.5 is wild. Can’t wait to try it.
Any way to connect this to claude code?
As posted below https://api-docs.deepseek.com/guides/coding_agents#integrate...
It's literally in the linked docs.
SOTA MRCR (or would've been a few hours earlier... beaten by 5.5), I've long thought of this as the most important non-agentic benchmark, so this is especially impressive. Beats Opus 4.7 here
Which version fits in a Mac Studio M3 Ultra 512 GB?
The Flash one should - it's 160GB on Hugging Face: https://huggingface.co/deepseek-ai/DeepSeek-V4-Flash/tree/ma...
So, dual RTX PRO 6000
Any visualised benchmark/scoreboard for comparison between latest models? DeepSeek v4 and GPT-5.5 seems to be ground breaking.
How can you reasonably try to get near frontier (even at all tps) on hardware you own? Maybe under 5k in cost?
For flash? 4 bit quant, 2x 96GB gpu (fast and expensive) or 1x 96GB gpu + 128GB ram (still expensive but probably usable, if you’re patient).
A mac with 256 GB memory would run it but be very slow, and so would be a 256GB ram + cheapo GPU desktop, unless you leave it running overnight.
The big model? Forget it, not this decade. You can theoretically load from SSD but waiting for the reply will be a religious experience.
Realistically the biggest models you can run on local-as-in-worth-buying-as-a-person hardware are between 120B and 200B, depending on how far you’re willing to go on quantization. Even this is fairly expensive, and that’s before RAM went to the moon.
Flash is less than 160 GB. No need to quantize to fit in 2x 96 GB. Not sure how much context fits in 30 GB, but it should be a good amount.
It seems to be 160GB at mixed FP4+FP8 precision, FYI. Full FP8 is 250GB+. (B)F16 at around double I would assume.
There is no BF16. There is no FP8 for the instruct model. The instruct model at full precision is 160 GB (mixed FP4 and FP8). The base model at full precision is 284 GB (FP8). Almost everyone is going to use instruct. But I do love to see base models released.
Run on an old HEDT platform with a lot of parallel attached storage (probably PCIe 4) and fetch weights from SSD. You'd ultimately be limited by the latency of these per-layer fetches, since MoE weights are small. You could reduce the latencies further by buying cheap Optane memory on the second-hand market.
The same way you fit a bucket wheel excavator in your garage
Very carefully
A loaded macbook pro can get you to the frontier from 24 months ago at ~10-40tok/s, which is plenty fast enough for regular chatting.
The low end could be something like an eBay-sourced server with a truckload of DDR3 ram doing all-cpu inference - secondhand server models with a terabyte of ram can be had for about 1.5K. The TPS will be absolute garbage and it will sound like a jet engine, but it will nominally run.
The flash version here is 284B A13B, so it might perform OK with a fairly small amount of VRAM for the active params and all regular ram for the other params, but I’d have to see benchmarks. If it turns out that works alright, an eBay server plus a 3090 might be the bang-for-buck champ for about $2.5K (assuming you’re starting from zero).
More like 500k
lots of great stuff, but the plot in the paper is just chart crime. different shades of gray for references where sometimes you see 4 models and sometimes 3.
Does deepseek has any coding plan?
no
giving meta a run for its money, esp when it was supposed to be the poster child for OSS models. deepseek is really overshadowing them rn
Meta is totally directionless
Interesting note:
"Due to constraints in high-end compute capacity, the current service capacity for Pro is very limited. After the 950 supernodes are launched at scale in the second half of this year, the price of Pro is expected to be reduced significantly."
So it's going to be even cheaper
The paper is here: [0]
Was expecting that the release would be this month [1], since everyone forgot about it and not reading the papers they were releasing and 7 days later here we have it.
One of the key points of this model to look at is the optimization that DeepSeek made with the residual design of the neural network architecture of the LLM, which is manifold-constrained hyper-connections (mHC) which is from this paper [2], which makes this possible to efficiently train it, especially with its hybrid attention mechanism designed for this.
There was not that much discussion around it some months ago here [3] about it but again this is a recommended read of the paper.
I wouldn't trust the benchmarks directly, but would wait for others to try it for themselves to see if it matches the performance of frontier models.
Either way, this is why Anthropic wants to ban open weight models and I cannot wait for the quantized versions to release momentarily.
[0] https://huggingface.co/deepseek-ai/DeepSeek-V4-Pro/blob/main...
[1] https://news.ycombinator.com/item?id=47793880
[2] https://arxiv.org/abs/2512.24880
[3] https://news.ycombinator.com/item?id=46452172
> this is why Anthropic wants to ban open weight models
Do you have a source?
More like he wants to ban accelerator chip sales to China, which may be about “national security” or self preservation against a different model for AI development which also happens to be an existential threat to Anthropic. Maybe those alternatives are actually one and the same to him.
Using it with opencode sometimes it generates commands like:
like generating output, but not actually running the bash command so not creating the PR ultimately. I wonder if it's a model thing, or an opencode thing.Anyone tried with make web UI with it? How good is it? For me opus is only worth because of it.
Has anyone used it? How does it compare to gpt 5.5 or opus 4.7?
I got an API key without credit card details I didn’t know they had a free plan.
How long does it usually take for folks to make smaller distills of these models? I really want to see how this will do when brought down to a size that will run on a Macbook.
Unsloth often turn them around within a few hours, they might have gone to bed already though!
Keep an eye on https://huggingface.co/unsloth/models
Update ten minutes later: https://huggingface.co/unsloth/DeepSeek-V4-Pro just appeared but doesn't have files in yet, so they are clearly awake and pushing updates.
"2 minutes ago" https://huggingface.co/unsloth/DeepSeek-V4-Pro
Those are quants, not distills.
Weren't there some frameworks recently released to allow Macs to stream weights from fast SSDs and thus fit way more parameters than what would normally fit in RAM?
I have never tried one yet but I am considering trying that for a medium sized model.
I've been calling that the "streaming experts" trick, the key idea is to take advantage of Mixture of Expert models where only a subset of the weights are used for each round of calculations, then load those weights from SSD into RAM for each round.
As I understand it if DeepSeek v4 Pro is a 1.6T, 49B active that means you'd need just 49B in memory, so ~100GB at 16 bit or ~50GB at 8bit quantized.
v4 Flash is 284B, 13B active so might even fit in <32GB.
The "active" count is not very meaningful except as a broad measure of sparsity, since the experts in MoE models are chosen per layer. Once you're streaming experts from disk, there's nothing that inherently requires having 49B parameters in memory at once. Of course, the less caching memory does, the higher the performance overhead of fetching from disk.
> ~100GB at 16 bit or ~50GB at 8bit quantized.
V4 is natively mixed FP4 and FP8, so significantly less than that. 50 GB max unquantized.
Ahh, that actually makes more sense now. (As you can tell, I just skimmed through the READMEs and starred "for later".)
My Mac can fit almost 70B (Q3_K_M) in memory at once, so I really need to try this out soon at maybe Q5-ish.
Streaming weights from RAM to GPU for prefill makes sense due to batching and pcie5 x16 is fast enough to make it worthwhile.
Streaming weights from RAM to GPU for decode makes no sense at all because batching requires multiple parallel streams.
Streaming weights from SSD _never_ makes sense because the delta between SSD and RAM is too large. There is no situation where you would not be able to fit a model in RAM and also have useful speeds from SSD.
There have been some very interesting experiments with streaming from SSD recently: https://simonwillison.net/2026/Mar/18/llm-in-a-flash/
These are more like experiments than a polished release as of yet. And the reduction in throughput is high compared to having the weights in RAM at all times, since you're bottlenecked by the SSD which even at its fastest is much slower than RAM.
Do you have the links for those? Very interested
Sure!
Note: these were just two that I starred when I saw them posted here. I have not looked seriously at it at the moment,
https://github.com/danveloper/flash-moe
https://github.com/t8/hypura
Aaaand it cant still name all the states in India,or say what happened in 1989
Ask Claude how to overthrow a Nazi dictatorship in the US.
Incredible model quality to price ratio
We will be hosting it soon at getlilac.com!
Amaze amaze amaze
Better link:
https://news.ycombinator.com/item?id=47885014
https://huggingface.co/deepseek-ai/DeepSeek-V4-Pro
congrats
Ah now !
OMG
OMG ITS HAPPENING
I hope the update is an improvement. Losing 3.2 would be a real loss, it's excellent.
History doesn't always repeat itself.
But if it does, then in the following week we'll see DeepSeek4 floods every AI-related online space. Thousands of posts swearing how it's better than the latest models OpenAI/Anthropic/Google have but only costs pennies.
Then a few weeks later it'll be forgotten by most.
It's difficult because even if the underlying model is very good, not having a pre-built harness like Claude Code makes it very un-sticky for most devs. Even at equal quality, the friction (or at least perceived friction) is higher than the mainstream models.
OpenCode? Pi?
If one finds it difficult to set up OpenCode to use whatever providers they want, I won't call them 'dev'.
The only real friction (if the model is actually as good as SOTA) is to convince your employer to pay for it. But again if it really provides the same value at a fraction of the cost, it'll eventually cease to be an issue.
I think you mean ollama vs llama.cpp.
I do!
Damn autocorrect :)
I call it autocorrupt :)
They have instructions right on their page on how to use claude code with it.