Show HN: Lowfat – pluggable CLI filter that saved 91.8% of my LLM tokens

(github.com)

40 points | by zdkaster 6 hours ago ago

26 comments

tuo-lei 3 minutes ago

the bigger problem is agents defaulting to the broadest command possible. kubectl get -o yaml when a jsonpath query would give 1/50th the tokens. filtering after the fact works, but you're still paying for the round trip. better to teach the agent to ask narrow questions in the first place.

cityofdelusion 6 minutes ago

This is a nice little project but I’m weary of sensationally inaccurate titles for stuff like this and the infamous caveman mode. It doesn’t save 91% of tokens: it reduced in one user case 91% of output tokens on the raw CLI output. I am being pedantic about this because these sorts of claims go viral and are inaccurate.

A proper benchmark will compare a large sample of identical prompting with and without the tool, against a specific harness. Once you apply Amdahl’s law, there is no way this saves 91% of tokens holistically, which the title implies.

I work in a non-tech company and these sorts of things keep going viral, with no understanding and with no comprehension of what is actually going on. Engineering is gone and cargo cult magical incantations are in.

jemmyw an hour ago

I've tried rtx and lean-ctx and these tools seem to end up confusing the agent more than helping. Any saving is irrelevant if the agent decides to work around the tool and makes even more calls than it would otherwise.

I don't know about cost saving, but if it's keeping the context size down I've had a lot better results using subagents to keep a higher order conversation clean for longer.

[-]

lxn 7 minutes ago

I looked into lean-ctx and decided not to use it. It has a very specific use case, and it's good when your interaction with the repository is read-only. When you want to edit, then the model has to read the whole file anyway. It's a cool tool, but it has a very narrow use case where it delivers the performance it claims.

exitb an hour ago

Subagents help with costs too, as they can run on much cheaper models.

alex7o 2 hours ago

I would like to have deeper comparison with alternatives like rtk, which are already fast and written in rust, also the previous comments mentioned something that has been a know problem with rtk that it sometimes strips the thing that the llm needs (or expects, causing more work to need to happan not less)

[-]

zdkaster 2 hours ago

In term of token saving performance, it should be on par with rtk since it is basically the same idea. The major different is rtk bundled hundreds of filter logic and no room for user to adjust without maintaing user owned fork or opening the pull request while lowfat is using opposite architectural approach by removing almost all filter logic in the binary and seperate user filters as a plugin system

wood_spirit an hour ago

I have my own llm wrapping harness, which does this and has a few more tricks. For example, it doesn’t have a lot of mcp but it does have search_mcp and load_mcp tools (and search_skills) so the llm can find what it needs when it needs it without bloating the normal baseline context. The LLMs have proved really good at using them. There is also a waypoint tool they can use to record their thinking in the context without it being the final output. Am thinking about a search_expert to find colleagues it can bring into conversations too. And a lot of other stuff.

Pro tip they worked well for me with response truncation: in the truncated output, say that the full text is available in /tmp/whereever.txt - that way, the llm will be able to query and read more using built in tools without reissuing the big tool call.

itsdesmond an hour ago

Have terms been established to describe these types of tools? How do I refer to small utilities to perform specific transformations to LLM behavior? CLI filter seems pretty good to describe this tool conversationally but not so much when searching, they some low cardinality keywords.

threecheese an hour ago

The docs are missing any examples of what this does, instead showing _how_ it works - and only for the codebase itself, rather than the behavior of the app.

What would be useful:

  - examples of text that can be filtered, and why that would be valuable
  - a data flow diagram of runtime behavior, showing how filtering removes unnecessary context

[-]

zdkaster 41 minutes ago

Thanks for your feedback. Will put this in place. Meanwhile, please checkout architecture doc and plugin. The plugin doc could a little bit giving insight of what it does.

tegiddrone 34 minutes ago

Still learning myself, but I've seen MCP tools just lightly wrap upstream json-body REST APIs. Works. But not only is the json structure more tokens but often the model just needs a small subset of fields in the payload.

[-]

zdkaster 25 minutes ago

To be safe if you need a full json, would make conditonal passthrough as the original raw output. Or, need to handle selective object using python via the filter plugin.

devdoc83 4 hours ago

How do you handle the risk of stripping out the exact stack trace the agent needed? That seems like the hard tradeoff here.

[-]

zdkaster 2 hours ago

It has the strip aggressiveness level suport. You can tune up 3 levels for each template output of your stacktrace using lowfat-filter dsl, shellscript or python.

ramon156 2 hours ago

In a perfect world the LLM needs to be very explicit on what it wants to read

[-]

nixpulvis 2 hours ago

The LLMs already do that themselves with `tail` all the time. There's a lot of room for improvement on top of that. Though they usually figure it out after a few tries. I often just paste manual runs errors myself anyway.

itsthecourier 3 hours ago

gonna ask the same... do far it's has been manually choosing what's useful in each command for the agents?

[-]

zdkaster 2 hours ago

It requires a bit effort in doing long-term adjustment and tuning for your agent common cli tools commands called. kinda need to evolve on day-to-day basis. But, agent itself can be useful to help tuning this.

fcanesin an hour ago

I am thinking that a small tool that simply refuses to pass large CLI output to the LLM and warns it to filter the results before reading would achieve this better as the LLM would be forced into thinking and writting the filter itself.

[-]

zdkaster an hour ago

I simply use LLM to create filter for my personal use. I have already put that specific instruction in the plugin doc in case you are interested.

pradeep1177 an hour ago

Would this have any impact on the response quality from the agent?

[-]

CharlesW 36 minutes ago

Yes, and never for the better.

[-]

zdkaster 30 minutes ago

Can you elaborate more on why would it so ?

zdkaster 39 minutes ago

Frankly, not at all.

[-]

pradeep1177 6 minutes ago

I have a suspicion that the model would miss more context unless you are very precise about what FAT means in each context. However, loved the idea.