17 comments

  • hank808 44 minutes ago

    You guys that continue to compare DGX Spark to the Mac Studios, please remember two things:

    1. Virtually every model that you'd run was developed on Nvidia gear and will run on Spark. 2. Spark has fast-as-hell interconnects. The sort of interconnects that one would want to use in an actual AI DC, so you can use more than one Spark at the same time, and RDMA, and actually start to figure out how things work the way they do and why. You can do a lot with 200 Gb of interconnect.

  • SethTro 3 hours ago

    Article doesn't seem to mention price which is $4,000 which makes it comparable to a 5090 but with 128GB of unified LPDDR5x vs the 5090's 32GB DDR7.

    • EnPissant an hour ago

      A 5090 is $2000.

    • CamperBob2 3 hours ago

      And about 1/4 the memory bandwidth, which is what matters for inference.

    • nialse 3 hours ago

      Well, that’s disappointing since the Mac Studio 128GB is $3,499. If Apple happens to launch a Mac Mini with 128GB RAM it would eat Nvidia Sparks’ lunch every day.

      • newman314 2 hours ago

        Agreed. I also wonder why they chose to test against a Mac Studio with only 64GB instead of 128GB.

        • yvbbrjdr 2 hours ago

          Hi, author here. I crowd-sourced the devices for benchmarking from my friends. It just happened that one of my friend has this device.

          • ggerganov 2 hours ago

            FYI you should have used llama.cpp to do the benchmarks. It performs almost 20x faster than ollama for the gpt-oss-120b model. Here are some samples results on my spark:

              ggml_cuda_init: found 1 CUDA devices:
                Device 0: NVIDIA GB10, compute capability 12.1, VMM: yes
              | model                          |       size |     params | backend    | ngl | n_ubatch | fa |            test |                  t/s |
              | ------------------------------ | ---------: | ---------: | ---------- | --: | -------: | -: | --------------: | -------------------: |
              | gpt-oss 20B MXFP4 MoE          |  11.27 GiB |    20.91 B | CUDA       |  99 |     2048 |  1 |          pp4096 |       3564.31 ± 9.91 |
              | gpt-oss 20B MXFP4 MoE          |  11.27 GiB |    20.91 B | CUDA       |  99 |     2048 |  1 |            tg32 |         53.93 ± 1.71 |
              | gpt-oss 120B MXFP4 MoE         |  59.02 GiB |   116.83 B | CUDA       |  99 |     2048 |  1 |          pp4096 |      1792.32 ± 34.74 |
              | gpt-oss 120B MXFP4 MoE         |  59.02 GiB |   116.83 B | CUDA       |  99 |     2048 |  1 |            tg32 |         38.54 ± 3.10 |
      • moondev an hour ago

        Just don't try to run a NCCL

  • pixelpoet 3 hours ago

    I wonder why they didn't test against the broadly available Strix Halo with 128GB of 256 GB/s memory bandwidth, 16 core full-fat Zen5 with AVX512 at $2k... it is a mystery...

    • yvbbrjdr 2 hours ago

      Hi, author here. I crowd-sourced the devices for benchmarking from my friends. It just happened that none of my friend has this device.

      • EnPissant an hour ago

        Something is wrong with your numbers: gpt-oss-20b and gpt-oss-120b should be much much faster than what you are seeing. I would suggest you familiarize yourself with llama-bench instead of ollama.

        Running gpt-oss-120b with a rtx 5090 and 2/3 of the experts offloaded to system RAM (less than half of the memory bandwidth of this thing), my machine gets ~4100tps prefill and ~40tps decode.

        Your spreadsheet shows the spark getting ~94tps prefill and ~11tps decode.

        Now, it's expected that my machine should slaughter this thing in prefill, but decode should be very similar or the spark a touch faster.

        • yvbbrjdr 30 minutes ago

          We actually profiled one of the models, and saw that the last GeMM, which is completely memory bound, is taking a lot of time, which reduces the token speed by a lot.

    • EnPissant an hour ago

      Strix Halo has the problem that prefill is incredibly slow if your context is not very small.

      The only thing that might be interesting about this DGX Spark is it's prefill manages to be faster due to better compute. I haven't compared the numbers yet, but they are included in the article.