1 comments

  • saschabuehrle 13 hours ago

    Hi HN,

    I'm launching cascadeflow – an open-source tool for AI model cascading that can reduce your AI provider costs by 30-65% with just 3 lines of code.

    The core insight: After a year of working with small language models and domain-specific models (especially on edge devices), I found that 80% of queries can be handled by cheaper, smaller models. Only the complex 20% actually need flagship models.

    How it works: 1. Route queries to a cheap "drafter" model first 2. Validate the response quality 3. If quality passes, return it (fast + cheap) 4. If not, escalate to an expensive "verifier" model

    We're seeing 40-85% cost savings in production workflows, with 70-80% of queries never touching the expensive model.

    Available for Python and TypeScript, with integrations for n8n and LiteLLM. MIT licensed.

    GitHub: https://github.com/lemony-ai/cascadeflow

    This is Day 2 of our release sprint. Would love to hear your feedback, especially if you're dealing with high AI API costs or running models on resource-constrained environments.