1 comments

  • gkanellopoulos 6 hours ago

    Your gradual transition from v1 to v3 is a common pattern I've seen elsewhere. Project teams usually start with retrieval then face recall quality problems at which point they start wondering whether or not to let the LLM and its context window take over. This is a natural and instinctive approach but imho there are two issues with that approach. Firstly the LLM decides what matters and that decision is irreversible and secondly it does not scale well over time. A couple of months after when the user will ask "What did I ask you to remind me about X 3 months ago?" the summary might have rotated by then that detail out.

    I agree that there is a fundamental issue with v1-style retrieval but in my view is not the scoring formula but the fact that similarity search mixes semantically related with really valuable data. For example a memory about "surfing last weekend" and a memory about "wanting to surf one day in Hawaii" will both score high in the question "What outdoor activities do I like?". However, in the question "what did I do last weekend?" only one is useful while both will appear in the injected context. One way that might solve this issue is by introducing more retrieval dimensions like keyword-matching (BM25), entity-aware scoring and temporal signals in order to then determine which memories are truly relevant to the users question. This of course adds up during ingestion but in general, async ingestion is underrated. Generally speaking users expect near-instant responses while ingestion can be slower.

    If I may ask, have you done any benchmarking on the v3 approach? It would be interesting to see how a v3-style solution handles factual vs general preference questions. This usually is a tricky one for memory systems.