Can I Buy Your KV Cache?

(arxiv.org)

22 points | by MediaSquirrel 2 hours ago ago

14 comments

lumost an hour ago

The KV cache is order dependent and dependent on the context of tokens which exist before the KV cache.

There are some transformation approaches to re-use the kv cache across inferences, but none are in wide use due to accuracy concerns following the transformation.

[-]

Eridrus an hour ago

The paper has a section on "Reusing precomputed KV across queries" which talks about how other papers have tried to address this problem, but yeah, this paper adds nothing on its own besides a catchy title.

dgellow an hour ago

Just curious, do you have links to read more about transformations or other techniques for KV cache reuse?

[-]

evrydayhustling an hour ago

All major model providers offer prefix caching, which is this.

[-]

lumost 30 minutes ago

No, reusing segments of the kv cache for different purposes in an order independent manner is an active research area.

[-]

dgellow 20 minutes ago

Any keyword or paper I can search for?

mistercow 43 minutes ago

> Then the part that matters: where the KV lives

When your abstract was clearly generated by an LLM and not curated to at least make it sound human, it does not make me want to read your paper.

TuringNYC 10 minutes ago

Seems Cloudflare is now doing this for scraping, so makes sense to continue down the pipeline!

refulgentis 9 minutes ago

This paper doesn't make any sense - for background, I've maintained an AI client that's cross-platform, cross-provider, and integrates llama.cpp since 2022. I don't know why they think "agents" don't share prefill work - paid providers cache on the prefill text, llama.cpp, same, and I specifically hooked up llama.cpp so it can do subsets as well. i.e. all the agents would reuse the cache

It reads like it started from an underspecification of "agents" x a strain of pop-wisdom about "KV cache" that I've seen enter mainstream discourse over the past 3 months that is Not Even Wrong, then, they solved a non-existent problem.

EDIT: based on the rest of comments either requesting a primer on terms, or, pointing out it makes errors in even more obvious ways, flagging.

[-]

christianqchung a minute ago

I don't think Luoyuan Zhang is necessarily doing this, but I'm pretty sure lots of people are using arxiv as a glorified blog and hoping no one notices.

tonetegeatinst an hour ago

Does anyone have a good recommendation for explaining or as a primer on KV cache?

[-]

plutomeetsyou 35 minutes ago

convert this question to KV cache and give it to your agent

sghiassy an hour ago

A truly global singleton

root-parent 2 hours ago

Lambda computing for prompts?