11 comments

  • LTL_FTC an hour ago

    It sounds like you don’t need immediate llm responses and can batch process your data nightly? Have you considered running a local llm? May not need to pay for api calls. Today’s local models are quite good. I started off with cpu and even that was fine for my pipelines.

  • 44za12 an hour ago

    This is the way. I actually mapped out the decision tree for this exact process and more here:

    https://github.com/NehmeAILabs/llm-sanity-checks

  • gandalfar an hour ago

    Consider using z.ai as model provider to further lower your costs.

    • viraptor an hour ago

      Or minimax - m2.1 release didn't make a big splash in the news, but it's really capable.

    • tehlike an hour ago

      This is what i was going to suggest too.

    • DANmode an hour ago

      Do they or any other providers offer any improvements on the often-chronicled variability of quality/effort from the major two services e.g. during peak hours?

  • joshribakoff an hour ago
  • dezgeg an hour ago

    Are you also adding the proper prompt cache control attributes? I think Anthropic API still doesn't do it automatically

  • DeathArrow 21 minutes ago

    You also can try to use cheaper models like GLM, Deepseek, Qwen,at least partially.

  • arthurcolle 8 hours ago

    Can you discuss a bit more of the architecture?

    • ok_orco 7 hours ago

      Pretty straightforward. Sources dump into a queue throughout the day, regex filters the obvious junk ("lol", "thanks", bot messages never hit the LLM), then everything gets batched overnight through Anthropic's Batch API for classification. Feedback gets clustered against existing pain points or creates new ones.

      Most of the cost savings came from not sending stuff to the LLM that didn't need to go there, plus the batch API is half the price of real-time calls.