I have a 16 GB Intel A770 and before that used an AMD Mi25.
I've had SDXL stable diffusion working on both, but struggled to get LLMs going. The entire field of software development is already well known for its technical debt and lack of interest in testing (see also: https://xkcd.com/2030/), but anything having to do with AI brings it to an all new level.
You pretty much need to run the same stack the developer used, down to the correct outdated version of Python and every library in use, as well as the same GPU drivers and OS version, or the whole thing falls apart.
Of course, various hardware vendors port everything to their hardware, so I could for example run Intel's OpenVINO version of llama.cpp, but I have the wrong Linux version to run their binaries, and I didn't want to put in the effort of running a new OS, but my computer couldn't finish compiling it overnight, so I gave up on it.
Of course, I could put it all in a VM, but then I'd take a performance hit and need even more RAM.
I have a 16 GB Intel A770 and before that used an AMD Mi25.
I've had SDXL stable diffusion working on both, but struggled to get LLMs going. The entire field of software development is already well known for its technical debt and lack of interest in testing (see also: https://xkcd.com/2030/), but anything having to do with AI brings it to an all new level.
You pretty much need to run the same stack the developer used, down to the correct outdated version of Python and every library in use, as well as the same GPU drivers and OS version, or the whole thing falls apart.
Of course, various hardware vendors port everything to their hardware, so I could for example run Intel's OpenVINO version of llama.cpp, but I have the wrong Linux version to run their binaries, and I didn't want to put in the effort of running a new OS, but my computer couldn't finish compiling it overnight, so I gave up on it.
Of course, I could put it all in a VM, but then I'd take a performance hit and need even more RAM.
Quewn3.6 35B A3B on MSI laptop with RTX 5080 (16G VRAM)
qwen3-coder:30b
codestral:22b
codegemma:7b
codellama:34b
north-mini-code-1.0:q8_0
laguna-xs.2:latest
Currently testing those above on AMD Ryzen 5 3600x with 48GB of RAM and a nVidia 3080 with 10GB of VRAM.
Favorite model is laguna-xs.2 because it is really fast on CPU and very good.
Oh! Looks like I’ve been sleeping on Laguna!
If you’re able to run qwen3-coder, have you thought about 3.6 27B or 35B? Looking at benchmarks, 3.6 looks its gained a lot over qwen3-coder
qwen 3.6 35B on 128GB strix halo.
perfect speed to not melt the brain and can extend context for well scoped projects.
need to work with dynamic context pruning to ensure full reuse in larger projects.
deer-flow seems. to work well for project scoping and high level evals. opencode for coding.