I've been trying deepseek-v4-flash in OpenCode (via OpenRouter) and I'm blown away. It's no Opus, obviously, but it had zero issues with any regular coding task I threw at it. v4-flash is remarkably "good enough" for what I needed. The whole evening of coding cost me $0.52 in API credits.
How does it compare to popular local inference engines, e.g. ollama, lm studio, or handrolled llama.cpp? I saw a brief benchmark in the readme but wasn't sure if there was more.
I've been trying deepseek-v4-flash in OpenCode (via OpenRouter) and I'm blown away. It's no Opus, obviously, but it had zero issues with any regular coding task I threw at it. v4-flash is remarkably "good enough" for what I needed. The whole evening of coding cost me $0.52 in API credits.
Using it in Kagi Assistant is stupidly slow. I get like 10 t/s.
While it’s pretty fast in the official app for example.
Kagi Assistant is also kind of broken when using Qwen 3.6 Plus.
So, beware of using them in Kagi at the moment.
How does it compare to popular local inference engines, e.g. ollama, lm studio, or handrolled llama.cpp? I saw a brief benchmark in the readme but wasn't sure if there was more.