Can I Buy Your KV Cache?

(arxiv.org)

22 points | by MediaSquirrel 2 hours ago ago

14 comments

  • lumost an hour ago

    The KV cache is order dependent and dependent on the context of tokens which exist before the KV cache.

    There are some transformation approaches to re-use the kv cache across inferences, but none are in wide use due to accuracy concerns following the transformation.

    • Eridrus an hour ago

      The paper has a section on "Reusing precomputed KV across queries" which talks about how other papers have tried to address this problem, but yeah, this paper adds nothing on its own besides a catchy title.

    • dgellow an hour ago

      Just curious, do you have links to read more about transformations or other techniques for KV cache reuse?

      • evrydayhustling an hour ago

        All major model providers offer prefix caching, which is this.

        • lumost 30 minutes ago

          No, reusing segments of the kv cache for different purposes in an order independent manner is an active research area.

          • dgellow 20 minutes ago

            Any keyword or paper I can search for?

  • mistercow 43 minutes ago

    > Then the part that matters: where the KV lives

    When your abstract was clearly generated by an LLM and not curated to at least make it sound human, it does not make me want to read your paper.

  • TuringNYC 10 minutes ago

    Seems Cloudflare is now doing this for scraping, so makes sense to continue down the pipeline!

  • refulgentis 9 minutes ago

    This paper doesn't make any sense - for background, I've maintained an AI client that's cross-platform, cross-provider, and integrates llama.cpp since 2022. I don't know why they think "agents" don't share prefill work - paid providers cache on the prefill text, llama.cpp, same, and I specifically hooked up llama.cpp so it can do subsets as well. i.e. all the agents would reuse the cache

    It reads like it started from an underspecification of "agents" x a strain of pop-wisdom about "KV cache" that I've seen enter mainstream discourse over the past 3 months that is Not Even Wrong, then, they solved a non-existent problem.

    EDIT: based on the rest of comments either requesting a primer on terms, or, pointing out it makes errors in even more obvious ways, flagging.

    • christianqchung a minute ago

      I don't think Luoyuan Zhang is necessarily doing this, but I'm pretty sure lots of people are using arxiv as a glorified blog and hoping no one notices.

  • tonetegeatinst an hour ago

    Does anyone have a good recommendation for explaining or as a primer on KV cache?

    • plutomeetsyou 35 minutes ago

      convert this question to KV cache and give it to your agent

  • sghiassy an hour ago

    A truly global singleton

  • root-parent 2 hours ago

    Lambda computing for prompts?