Ask HN: Running local LLMs? What's your model and hardware

9 points | by alfiedotwtf 8 hours ago ago

6 comments

I have a 16 GB Intel A770 and before that used an AMD Mi25.

I've had SDXL stable diffusion working on both, but struggled to get LLMs going. The entire field of software development is already well known for its technical debt and lack of interest in testing (see also: https://xkcd.com/2030/), but anything having to do with AI brings it to an all new level.

You pretty much need to run the same stack the developer used, down to the correct outdated version of Python and every library in use, as well as the same GPU drivers and OS version, or the whole thing falls apart.

Of course, various hardware vendors port everything to their hardware, so I could for example run Intel's OpenVINO version of llama.cpp, but I have the wrong Linux version to run their binaries, and I didn't want to put in the effort of running a new OS, but my computer couldn't finish compiling it overnight, so I gave up on it.

Of course, I could put it all in a VM, but then I'd take a performance hit and need even more RAM.

msalsas 2 hours ago

Quewn3.6 35B A3B on MSI laptop with RTX 5080 (16G VRAM)

roscas 7 hours ago

qwen3-coder:30b

codestral:22b

codegemma:7b

codellama:34b

north-mini-code-1.0:q8_0

laguna-xs.2:latest

Currently testing those above on AMD Ryzen 5 3600x with 48GB of RAM and a nVidia 3080 with 10GB of VRAM.

Favorite model is laguna-xs.2 because it is really fast on CPU and very good.

[-]

alfiedotwtf 7 hours ago

Oh! Looks like I’ve been sleeping on Laguna!

alfiedotwtf 7 hours ago

If you’re able to run qwen3-coder, have you thought about 3.6 27B or 35B? Looking at benchmarks, 3.6 looks its gained a lot over qwen3-coder

cyanydeez 5 hours ago

qwen 3.6 35B on 128GB strix halo.

perfect speed to not melt the brain and can extend context for well scoped projects.

need to work with dynamic context pruning to ensure full reuse in larger projects.

deer-flow seems. to work well for project scoping and high level evals. opencode for coding.