You could quickly wire up one of the LLM APIs as an application-defined function using SQLite if you wanted to play around with the idea of very slow and expensive queries:
Maybe stick with the aggregate variety of function at first if you don't want any billing explosions. I'd probably begin with something like LLM_Summary() and LLM_Classify(). The summary could be an aggregate, and the classify could be a scalar. Being able to write a query like:
SELECT LLM_Summary(Comment)
FROM Users
WHERE datetime(Updated_At) >= datetime('now', '-1 day');
Is more expedient than wiring up the equivalent code pile each time. The aggregation method's internals could handle hierarchical summarization, chunking, etc. Or, throw an error back to the user so they are forced to devise a more rational query.
openai-to-sqlite query database.db "
update messages set sentiment = chatgpt(
'Sentiment analysis for this message: ' || message ||
' - ONLY return a lowercase string from: positive, negative, neutral, unknown'
)
where sentiment not in ('positive', 'negative', 'neutral', 'unknown')
or sentiment is null
"
I haven't revisited the idea for fear of the amount it could cost if you ran it against a large database, but given the crashing prices of Gemini Flash, GPT-4o mini etc maybe it's worth another look!
I love that and would maybe even add a model price parameter on each such function call. Perhaps e.g. an number in the range 1-10, with 1 being the cheapest available, and 10 being the best available (whatever the price), and then we'd have environment settings choose the actual models to use for each value. And perhaps have an overload fail-safe to switch all of the queries to cheaper models as a form of throttling.
Is there any reason why you can't build a full agent in SQL? SQLite with a little sauce should be competitive with Langchain if you think a little bit.
I don't see why not. Recursive CTEs allow you to express pretty much anything you desire.
You could also expose additional functions corresponding to the external tools that you would like the agent to have access to and pass these as arguments to additional UDFs.
You could also lean into data-driven and express much of the configuration in tables and then use the enhanced SQL dialect to tie everything together at runtime. In SQLite, during UDF execution, arbitrary queries can be ran. You could pull tool descriptions, parameter lists, enums, etc. from ordinary SQL tables without having to pass explicit args to the functions.
You could quickly wire up one of the LLM APIs as an application-defined function using SQLite if you wanted to play around with the idea of very slow and expensive queries:
https://sqlite.org/appfunc.html
https://learn.microsoft.com/en-us/dotnet/standard/data/sqlit...
Maybe stick with the aggregate variety of function at first if you don't want any billing explosions. I'd probably begin with something like LLM_Summary() and LLM_Classify(). The summary could be an aggregate, and the classify could be a scalar. Being able to write a query like:
Is more expedient than wiring up the equivalent code pile each time. The aggregation method's internals could handle hierarchical summarization, chunking, etc. Or, throw an error back to the user so they are forced to devise a more rational query.I tried that a couple of years ago with a CLI tool that uses Python functions called from SQLite - it worked with GPT-3.5: https://simonwillison.net/2023/Apr/29/enriching-data/
Example usage:
I haven't revisited the idea for fear of the amount it could cost if you ran it against a large database, but given the crashing prices of Gemini Flash, GPT-4o mini etc maybe it's worth another look!The API call is the same price per token regardless of how you run it!
if you switch model to GPT-4.5-preview you can spend a lot of money very quickly
I love that and would maybe even add a model price parameter on each such function call. Perhaps e.g. an number in the range 1-10, with 1 being the cheapest available, and 10 being the best available (whatever the price), and then we'd have environment settings choose the actual models to use for each value. And perhaps have an overload fail-safe to switch all of the queries to cheaper models as a form of throttling.
Is there any reason why you can't build a full agent in SQL? SQLite with a little sauce should be competitive with Langchain if you think a little bit.
I don't see why not. Recursive CTEs allow you to express pretty much anything you desire.
You could also expose additional functions corresponding to the external tools that you would like the agent to have access to and pass these as arguments to additional UDFs.
You could also lean into data-driven and express much of the configuration in tables and then use the enhanced SQL dialect to tie everything together at runtime. In SQLite, during UDF execution, arbitrary queries can be ran. You could pull tool descriptions, parameter lists, enums, etc. from ordinary SQL tables without having to pass explicit args to the functions.
https://explainextended.com/2023/12/31/happy-new-year-15/
Use triggers to execute trees of tasks and you have a working agentic system!
Here is the implementation https://github.com/dsg-polymtl/flockmtl
I'll try this at work tomorrow !