DeepSeek-v3.1-Base

(huggingface.co)

25 points | by meetpateltech 2 days ago ago

9 comments

  • OgsyedIE 2 days ago

    Sycophancy is noticeably higher and a couple tests on domains where I can assess output quality from my expertise (outlay a parts buyer workflow given some proprietary details, explain why measures can't distinguish between two given countable subsets of the transcendentals, write a contrarian defense of Thrasymachus, show how the SEC phase of UEFI boot changed from pre-8 to 8 to 10 and 11) gave no difference in quality.

    I'm gonna stick with v3-0324 and I recommend that others do the same.

  • martianlantern 2 days ago

    Is there any benchmarks and comparisons compared to gpt-oss? I believe it far exceeds gpt oss or even gpt5 otherwise they wounldn't have released it

    • a day ago
      [deleted]
    • a day ago
      [deleted]
    • rmoriz 2 days ago

      the model was released literally one hour ago so we need to be a little bit more patient.

      • swyx 2 days ago

        even though I'm normally a fan of release early and often, deepseek often loses some impact because they tend to release model and evals on different days. it wouldnt hurt anyone to just wait a day to release both together so that the conversations are more substantive.

        ofc deepseek is doing the highest order bit of just train a good model and let everyone figure it out on their own time.

    • guluarte 2 days ago

      Scores 71.6% on Aider Benchmark

  • nicohayes 2 days ago

    Interesting observation about the increased sycophancy. Your tests on specific domains are insightful. Seems like v3.1 might be a step back in practical quality. Thanks for sharing your experience, I'll probably hold off on upgrading for now too.