11 comments

  • bob1029 9 hours ago

    You could quickly wire up one of the LLM APIs as an application-defined function using SQLite if you wanted to play around with the idea of very slow and expensive queries:

    https://sqlite.org/appfunc.html

    https://learn.microsoft.com/en-us/dotnet/standard/data/sqlit...

    Maybe stick with the aggregate variety of function at first if you don't want any billing explosions. I'd probably begin with something like LLM_Summary() and LLM_Classify(). The summary could be an aggregate, and the classify could be a scalar. Being able to write a query like:

      SELECT LLM_Summary(Comment)
      FROM Users
      WHERE datetime(Updated_At) >= datetime('now', '-1 day');
    
    Is more expedient than wiring up the equivalent code pile each time. The aggregation method's internals could handle hierarchical summarization, chunking, etc. Or, throw an error back to the user so they are forced to devise a more rational query.
    • simonw 8 hours ago

      I tried that a couple of years ago with a CLI tool that uses Python functions called from SQLite - it worked with GPT-3.5: https://simonwillison.net/2023/Apr/29/enriching-data/

      Example usage:

        openai-to-sqlite query database.db "
          update messages set sentiment = chatgpt(
            'Sentiment analysis for this message: ' || message ||
            ' - ONLY return a lowercase string from: positive, negative, neutral, unknown'
          )
          where sentiment not in ('positive', 'negative', 'neutral', 'unknown')
            or sentiment is null
        "
      
      I haven't revisited the idea for fear of the amount it could cost if you ran it against a large database, but given the crashing prices of Gemini Flash, GPT-4o mini etc maybe it's worth another look!
      • datadrivenangel 5 hours ago

        The API call is the same price per token regardless of how you run it!

      • arthurcolle 6 hours ago

        if you switch model to GPT-4.5-preview you can spend a lot of money very quickly

    • falcor84 8 hours ago

      I love that and would maybe even add a model price parameter on each such function call. Perhaps e.g. an number in the range 1-10, with 1 being the cheapest available, and 10 being the best available (whatever the price), and then we'd have environment settings choose the actual models to use for each value. And perhaps have an overload fail-safe to switch all of the queries to cheaper models as a form of throttling.

    • datadrivenangel 5 hours ago

      Is there any reason why you can't build a full agent in SQL? SQLite with a little sauce should be competitive with Langchain if you think a little bit.

      • bob1029 5 hours ago

        I don't see why not. Recursive CTEs allow you to express pretty much anything you desire.

        You could also expose additional functions corresponding to the external tools that you would like the agent to have access to and pass these as arguments to additional UDFs.

        You could also lean into data-driven and express much of the configuration in tables and then use the enhanced SQL dialect to tie everything together at runtime. In SQLite, during UDF execution, arbitrary queries can be ran. You could pull tool descriptions, parameter lists, enums, etc. from ordinary SQL tables without having to pass explicit args to the functions.

  • ofrzeta 7 hours ago

    Here is the implementation https://github.com/dsg-polymtl/flockmtl

  • simlevesque 8 hours ago

    I'll try this at work tomorrow !