Show HN: Standalone TurboQuant KV Cache Inference

(github.com)

3 points | by g023 3 days ago ago

2 comments

  • santander_cl 3 days ago

    Starred immediately.

    This is exactly the kind of practical quantization work that makes running longer-context models on consumer GPUs actually feasible. Looking forward to seeing it generalized beyond the one model.Great stuff, g023.

  • ensotrade_tech 2 days ago

    What does it actually do?