DeepSeek-v3.1-Base

(huggingface.co)

25 points | by meetpateltech 2 days ago ago

9 comments

Sycophancy is noticeably higher and a couple tests on domains where I can assess output quality from my expertise (outlay a parts buyer workflow given some proprietary details, explain why measures can't distinguish between two given countable subsets of the transcendentals, write a contrarian defense of Thrasymachus, show how the SEC phase of UEFI boot changed from pre-8 to 8 to 10 and 11) gave no difference in quality.

I'm gonna stick with v3-0324 and I recommend that others do the same.

martianlantern 2 days ago

Is there any benchmarks and comparisons compared to gpt-oss? I believe it far exceeds gpt oss or even gpt5 otherwise they wounldn't have released it

[-]

a day ago

[deleted]

a day ago

[deleted]

rmoriz 2 days ago

the model was released literally one hour ago so we need to be a little bit more patient.

[-]

swyx 2 days ago

even though I'm normally a fan of release early and often, deepseek often loses some impact because they tend to release model and evals on different days. it wouldnt hurt anyone to just wait a day to release both together so that the conversations are more substantive.

ofc deepseek is doing the highest order bit of just train a good model and let everyone figure it out on their own time.

guluarte 2 days ago

Scores 71.6% on Aider Benchmark

[-]

Alifatisk a day ago

So it beats Claude 4 Opus on Aider Polyglot

https://xcancel.com/scaling01/status/1957890953026392212

nicohayes 2 days ago

Interesting observation about the increased sycophancy. Your tests on specific domains are insightful. Seems like v3.1 might be a step back in practical quality. Thanks for sharing your experience, I'll probably hold off on upgrading for now too.