Mini PC for local LLMs in 2026

(terminalbytes.com)

27 points | by charlieirish an hour ago ago

28 comments

  • dannyw 44 minutes ago

    > The 256 GB/s number is real, but for context, an Apple M5 Ultra hits ~800 GB/s on its unified memory

    The M5 Ultra has not been even announced.

    This article appears to be predominately or entirely LLM-produced with little to no human review, and contains numerous material and misinforming errors.

    It also omits serious contenders that's worth at least comparing, like the DGX Spark.

    • woadwarrior01 38 minutes ago

      It appears to be an LLM-generated affiliate link farm.

  • pjmlp an hour ago

    Currently NVidia's mini PC, or the version licensed to Asus, is one of the few that I can actually buy with Linux pre-installed with a fully OEM supported version.

    One would expect that by now buying desktop class computers on shops with a Linux experience would be rather common.

    Geekcom devices that it advertises as Linux ready, are actually sold with Windows pre-installed.

    I guess they mean WSL ready.

    • Neywiny an hour ago

      I would guess they mean it's ready for you to install Linux on it

      • pjmlp 38 minutes ago

        Yeah, ignoring the whole fragmentation that keeps happening on the desktop stack, The Year of Desktop Linux will never happen if only computer nerds get to build such systems, as it has always been.

        Instead normies get The Year of Linux kernel deployed with all kinds of consumer devices, and The Year of Linux VMs on retail.

  • mark_l_watson 39 minutes ago

    I bought a 32G MacMini over two years ago and it has been great for experimenting with local models, and now is even useful for local coding (at a slow speed!) with models supporting large context sizes.

    With the current extreme RAM shortage I deeply regret not buying a 64G MacMini a few months ago.

    I bet a zillion people feel the same way.

    • pjmlp 19 minutes ago

      Which is why the Mac Pro was actually relevant.

      Those of us on PC land can at least extend them, or exchange the GPU, even if pricey.

      Apple has lost the server and workstation market by their own decisions.

  • alexktz an hour ago

    Could we post articles that are obviously written by an LLM with a flair?

    • aalam an hour ago

      "Here's the part that nobody talks about"

      "Two gotchas before you click buy"

      I really think there could be a score for entropy in playfulness that should differentiate LLM output

  • visarga 37 minutes ago

    Good research, but man do I feel the LLM vibe shining through. That sustained information density...

    • jcgrillo 26 minutes ago

      Look closer, it really isn't good research

  • bluechair an hour ago

    “What’s the memory bandwidth (GB/s) of the device holding the model weights?”

    Isn’t the recommended option going to be dog slow at 256 GB/s.

  • lkey 39 minutes ago

    This article was authored by AI. It contains hallucinated info from compilations of random reddit threads.

    • visarga 36 minutes ago

      Yes, I too think it's authored by AI, but can you indicate where it is wrong?

  • bachmeier an hour ago

    "Local inference is rarely cheaper if you’re being honest with yourself about how much you actually use it."

    Sorry, but this is not even close to "being honest", it's bad math. That calculation assumes you do nothing with the computer other than local inference.

    • hdgvhicv 41 minutes ago

      Doesnt that calculation assume you value your privacy and owmership at zero too?

    • spwa4 15 minutes ago

      Huh, you make me curious. Let's actually do that calculation. Let's say you do actually do 24/7/365 AI use. Let's say by some miracle you can do 60 t/s on Qwen 3.6 27b, and let's say this PC cost $3000 (you should be able to do this on a DGX spark, and one of the non-Nvidia models, e.g. the Dell one. $3000 would be a good price, but not totally out of the question). And, of course, let's say these prices remain stable.

      So that gets you 1_892_160_000 tokens per year at full blast.

      If you go the openrouter, eh, route, you'd get charged $2 per million tokens (anywhere from $2 to $3.6 per million tokens). So the value you'd get from your machine at 100% utilization is 1892 * $2 = $3784 up to 1892 * $3.6 = $6800)

      So yeah, not counting electricity and your time the machine "is worth it".

      [1] https://openrouter.ai/qwen/qwen3.6-27b/providers

  • jmyeet 36 minutes ago

    There's some mention of Apple silicon here but it's worth expanding upon. Macs have a unified memory architecture. So if you have a Mac with 64GB of memory then the GPU can use all of that. This is potentially quite useful but Apple silicon in general is limited by memory bandwidth. For comparison, a 5090 is 1792GB/s. Here are some examples:

    - GMKTek EVO-X2: 120GB/s reads, 212GB/s writes

    - NVidia DGX Spark 273GB/s

    - Mac Mini M4 120GB/s but only $600+

    - Mac Mini w/ M4 Pro 273GB/s ($2199 for 64GB)

    - Mac Studio M4 Max 410GB/s ($3500 for 128GB)

    - Mac Studio M3 Ultra 819GB/s ($5500 for 96GB)

    - Macbook Pro 16" with M5 Pro 64GB 307GB/s ($3300)

    - Macbook Pro 16" with M5 Max 128GB 460GB/s ($5399)

    Sadly, Apple discontinued the 512GB Mac Studio. Mac Studios are a little long in the tooth now and due for an upgrade this year. I suspect that prices will be a lot higher given the RAM prices but we'll see.

  • jcgrillo 42 minutes ago

    I got a well used HP Z840 with 256GB ECC DDR4 and twin Xeons ca. 2014. Then I slapped 2 AMD V640 32GB passively cooled GPUs in it with some 3D printed fan shrouds and 2 1U 15k rpm fans each. They just fit! I needed to order a quad 8pin power cable, the standard configuration has 3 6pin cables--but there's unused pins on the GPU power rail, and there are aftermarket suppliers.

    72 Xeon cores

    256GB ECC DDR4

    64GB VRAM

    $2200 total

    I run it on a 20A 240V outlet to make sure the power supply can deliver enough watts, but so far it's working pretty well. The eWaste LLM rig is probably not as good value for money as a new machine, but it gets the job done cheaper (for now).

    EDIT: IIRC this approach gets me more VRAM bandwidth than Strix Halo at the cost of less addressable GBs (but a lot more total system RAM), but I figured with CPU offloading that might make up for it?

    ALSO EDIT: Note you can get a 128GB Strix Halo motherboard minus power supply, fans, case, etc from Framework for $2200.. that could work if you have some parts lying around.

  • croes an hour ago

    > 128GB Ryzen AI MAX+ 395, listed at $2,099.

    Wasn‘t that a discounted price?

    • cowmix 40 minutes ago

      I got mine almost exactly a year ago - $1699 direct from GMKTEK. To think it retails for 2X that, a year later, blows my mind.

  • znpy an hour ago

    As somebody that has a vague interest in running local LLMs… they day i decide to burn cash on hardware I might as well go all-in a get either a 128gb mac studio or an nvidia dgx spark (or some other equivalent gb10-based system).

    The 64gb mac mini is also interesting, if anything because it is very likely to hold most of its value when reselling.

    I’m keeping an eye on the next apple hardware refreshes, particularly for mac minis and mac studios.

    • edot 33 minutes ago

      I am in a similar boat to you, but I can’t make the money math work. Local LLMs obviously have a privacy benefit but DeepSeek V4 Flash (which you’ll struggle to get running on any single Mac - you’d need at least 128gb RAM) is $0.14$/mtok input $0.28/mtok output on the API. You’d have to be just absolutely burning tokens to ever make this make sense.

      Mac Studio M4 Max with 128gb at $3,699 (if you can find it) would equate to 10 million tokens a day of mixed input-output for over 5 years to break even. At which point that hardware is outdated compared to the SOTA models that will probably still be cheap on hosted platforms.

    • amelius an hour ago

      The models are good enough now, so I'm waiting for the day they start selling inference ASICs with 100x the token output speed. See Taalas demo.

      • adityamwagh 42 minutes ago

        Taalas is a nice concept, but I don’t want to use the same model forever!

        • amelius 32 minutes ago

          Just buy a new one every few years, just like your phone and laptop. And sell the old one.

    • 2ndorderthought an hour ago

      I just use my gaming pc. So I can play games or code with assistance for fun. It's awesome because it's mine and technically I can do whatever I want with it. Having a decent computer around and lower end laptops is pretty budget friendly.

    • walthamstow an hour ago

      The 14inch Macbook Pros with 64GB are really good value considering it's a much more complicated machine than the Mini.