3 points | by zippode 4 days ago ago
4 comments
interesting approach building this in rust the latency argument makes sense for something sitting inline with every LLM request. curious how it handles multi-turn context where injection might be spread across messages rather than a single prompt.
one I forgot, please visit the benchmark of Isartor and see the deflection rate to reduce LLM tokens: https://github.com/isartor-ai/Isartor/tree/main/benchmarks
[dead]
interesting approach building this in rust the latency argument makes sense for something sitting inline with every LLM request. curious how it handles multi-turn context where injection might be spread across messages rather than a single prompt.
one I forgot, please visit the benchmark of Isartor and see the deflection rate to reduce LLM tokens: https://github.com/isartor-ai/Isartor/tree/main/benchmarks
[dead]
[dead]