A few words on DS4

(antirez.com)

53 points | by caust1c an hour ago ago

13 comments

  • 0xbadcafebee 5 minutes ago

    [delayed]

  • kamranjon 16 minutes ago

    Just want to mention that I've been pulling down and using DwarfStar locally and it's incredible. I actually have it running on my personal macbook m4 max with 128gb of ram and I am running the server to share it through tailscale with my work laptop and just have pi running there.

    The long context reasoning is something I haven't even seen in frontier models - I was running at 124k tokens earlier and it was still just buzzing along with no issues or fatigue.

    I am amazed at how well it works, I'm using it right now for some pretty complex frontend work, and it is much much faster than, for example running a dense 27b or 31b model (like qwen or gemma) for me (The benefits of MoE) - but the long context capabilities have been what have been absolutely flooring me.

    Super excited about this project and hope Antirez can keep himself from burning out - i've been following the repo pretty closely and there are a ton of PR's flooding in and it seems like he's had to do a lot of filtering out of slop code.

    • le-mark 7 minutes ago

      Is DS4 dwarf star 4 or deep seek 4?

      • kamranjon 6 minutes ago

        Just updated! Sorry I meant Dwarf Star - it's the only way I've actually managed to run DeepSeek flash on my local hardware

      • wolttam 6 minutes ago

        DwarfStar 4 is DeepSeek 4 (check the repo)

  • simonw 33 minutes ago

    I got this running on a 128GB M5 the other day - pretty painless, model runs in about 80GB of RAM and it seemed to be very capable at writing code and tool execution.

    • perfmode 22 minutes ago

      How’s the token throughput / response time?

      • simonw 19 minutes ago

        Healthy!

          prefill: 30.91 t/s, generation: 29.58 t/s
        
        From https://gist.github.com/simonw/31127f9025845c4c9b10c3e0d8612...
        • xienze 9 minutes ago

          I don't want to be a jerk but 31t/s prefill is basically unusable in an agentic situation. A mere 10k in context and you're sitting there for 5+ minutes before the first token is generated.

          • aiscoming 7 minutes ago

            if it's just the coding agent system prompt and tools, you can cache that

            • xienze 2 minutes ago

              Yeah the problem is that's just the start of the context. There's, you know, all the tool call results and file reads and stuff.

  • bjconlan an hour ago

    This is great! I feel the same way about the deepseek v4 architecture for commodity hardware.

    Also have enjoyed playing with https://huggingface.co/HuggingFaceTB/nanowhale-100m-base (but early days for me understanding this space)

    • kamranjon 2 minutes ago

      Very cool! I had no idea that HF was doing this - I really love their small model experiments.