AI on a Budget: Recompiling Llama.cpp for Qwen3.5 Inference on an HP Z440

(jeanbaptistefleury.neocities.org)

2 points | by DAFtwinTurbo 7 hours ago ago

1 comments

  • DAFtwinTurbo 7 hours ago

    Hi hackernews,

    I wrote a small blogpost on a little experiment I did last week-end. The goal was to see if I could get more tok/s performance from llama.cpp running the latest Qwen3.5 models. A 5.5x perf increase was achieved by 1. recompiling with optimization flags 2. using ik_llama.cpp!

    The build I'm using is the 750 USD rig from Digital Spaceport.

    Very cool to see more than usable speeds on a cheap setup like this.