If understanding correctly, Absurd (by the Pi LLM harness devs) minimizes the pure db approach as much as possible. I only just started getting into the topic myself, though.
Without reading any of the doco, it appears to be a job definition called invoice-approval-pipeline that runs every 5 seconds.
The steps are:
1. Get all the pending invoices
2. Set their state to "processing"
3. Call out to an external service/process to do the actual processing, wait for a response.
4. If the response is OK, do something
5. Wait 5 seconds and then start again.
Not sure I love the syntax and the way SQL is embedded between the $$
But it is in the database, can be updated and modified in the same way as all the other stored procedures/functions, allows job control, I assume other control structures for parallel steps etc.
"dollar quoting" is the PostgreSQL way to quote strings with quotes, avoiding double quoting or escape characters. I like to use the tagged version of it, like $sql$ SELECT ... $sql$ to describe what is inside.
Contributor here - at Microsoft we've built AI workflows on pg_durable and seen it substantially reduce code and increase reliability. Agree that the DSL ergonomics can be improved. Our pipelines use a higher level language and therefore simplified, but pg_durable is meant to solve a wider array of problems. We're happy to take suggestions for improvements.
Somebody else in the thread brought up the benefit of snapshotting a database at a point in time stores not only the state of execution but also the code, etc. That is a unique benefit I'd be interested in exploring over storing your orchestration outside of the database.
Not trying to dismiss the project - it looks like a lot of hard work has gone in and somebody has a use for it. I just come from an airflow style external orchestrator frame of mind that manages durability state in postgres but keeps the control flow out. Sorry if I came off as a bit snarky
Can anyone explain why I would want to use this over an orchestration tool that lives outside the DB? Read through the Readme and some of the examples, I still don't get it.
Snapshot PITR of your database means everything restores including the durable jobs at the PIT.
Don't need to synchronize the backups with anything else that is part of the same data store, good for ETL pipelines and other state machine type jobs.
If your ETL is mostly SQL anyway, then having the actual job being run on the same server helps as well.
You can have well-integrated applicative workflows (eg: progress report on a permalink in your front end app), app-restart-proof resumable workflows, and it avoids adding an extra piece of infrastructure.
We use Postgres for that on https://transport.data.gouv.fr (Elixir app which does a fair bit of processing), and it helps.
Not familiar yet with pg_durable though, but I have used or implemented similar solutions and can relate.
Contributor here. At Microsoft, our Postgres customers seem to split pretty evenly into 2 camps, those that want to do as much as they can in the database, and those that agree with your take - want to keep apps and compute outside the DB.
Looks pretty good but I wonder why they didn’t build it on pgmq? If you’re on elixir I maintain a DAG package around this (based on and compatible with pgflow.dev which is TS/Deno).
I'm trapped on Azure at work and we're constantly waiting for Azure pg to catch up with modernity.
For example, you cant use this: https://www.paradedb.com/blog/hybrid-search-in-postgresql-th...
Also for example, you dont get ultra-wide high dimensionality vectors.
It is nice they are open sourcing pg_durable, but how about adopting table stakes I'd get with AWS?
2026 is the year of the Postgres queue! (DBOS[0], pgQue[1]) It's awesome that the community is contributing this and giving us the option to use it.
As an ex-app engineer though, I kind of prefer my queue logic to be in code, in Git, but maybe with the right tooling, you can change my mind. :)
[0]: https://www.dbos.dev/
[1]: https://github.com/NikolayS/pgque
If understanding correctly, Absurd (by the Pi LLM harness devs) minimizes the pure db approach as much as possible. I only just started getting into the topic myself, though.
https://github.com/earendil-works/absurd
> When not to use it > … > The workflow mostly lives outside Postgres and spans many heterogeneous systems.
How is this project at all comparable to something like Temporal? Am I misunderstanding the limitation implied by this particular recommendation?
I aggree - I'm not understanding the value of the project either if you look at the example here https://github.com/microsoft/pg_durable/blob/main/examples/i...
It's an interesting technical achievement I guess, but it's very bizarre to try and read this
Without reading any of the doco, it appears to be a job definition called invoice-approval-pipeline that runs every 5 seconds.
The steps are:
1. Get all the pending invoices
2. Set their state to "processing"
3. Call out to an external service/process to do the actual processing, wait for a response.
4. If the response is OK, do something
5. Wait 5 seconds and then start again.
Not sure I love the syntax and the way SQL is embedded between the $$
But it is in the database, can be updated and modified in the same way as all the other stored procedures/functions, allows job control, I assume other control structures for parallel steps etc.
Gonna go read the doco now.
"dollar quoting" is the PostgreSQL way to quote strings with quotes, avoiding double quoting or escape characters. I like to use the tagged version of it, like $sql$ SELECT ... $sql$ to describe what is inside.
Contributor here - at Microsoft we've built AI workflows on pg_durable and seen it substantially reduce code and increase reliability. Agree that the DSL ergonomics can be improved. Our pipelines use a higher level language and therefore simplified, but pg_durable is meant to solve a wider array of problems. We're happy to take suggestions for improvements.
Somebody else in the thread brought up the benefit of snapshotting a database at a point in time stores not only the state of execution but also the code, etc. That is a unique benefit I'd be interested in exploring over storing your orchestration outside of the database.
Not trying to dismiss the project - it looks like a lot of hard work has gone in and somebody has a use for it. I just come from an airflow style external orchestrator frame of mind that manages durability state in postgres but keeps the control flow out. Sorry if I came off as a bit snarky
I hope it could be used in the future to export pg_dump formated exports to s3.
One would be able to trigger maintenance jobs via simple lambda functions whose duration is capped.
Seems like an interesting idea to add durability and resumability to lengthy cron jobs.
Is this an open sourcing of something they use internally? My first thought on durable jobs was GHA aka Azure Devops.
Can anyone explain why I would want to use this over an orchestration tool that lives outside the DB? Read through the Readme and some of the examples, I still don't get it.
Snapshot PITR of your database means everything restores including the durable jobs at the PIT.
Don't need to synchronize the backups with anything else that is part of the same data store, good for ETL pipelines and other state machine type jobs.
If your ETL is mostly SQL anyway, then having the actual job being run on the same server helps as well.
It’s sometimes convenient if database is the only ”stateful” component in architecture.
Also if all the "state" is in one database, then you have better chance of getting consistent backups.
You can have well-integrated applicative workflows (eg: progress report on a permalink in your front end app), app-restart-proof resumable workflows, and it avoids adding an extra piece of infrastructure.
We use Postgres for that on https://transport.data.gouv.fr (Elixir app which does a fair bit of processing), and it helps.
Not familiar yet with pg_durable though, but I have used or implemented similar solutions and can relate.
Contributor here. At Microsoft, our Postgres customers seem to split pretty evenly into 2 camps, those that want to do as much as they can in the database, and those that agree with your take - want to keep apps and compute outside the DB.
This feels like the wrong solution to an age old problem solved by the DAG schedulers like Apache Airflow for a while now.
Why would I want to store my control flow in the database and not in code? It feels strange.
Not trying to dismiss the project, I'm just not getting it yet I think.
Looks pretty good but I wonder why they didn’t build it on pgmq? If you’re on elixir I maintain a DAG package around this (based on and compatible with pgflow.dev which is TS/Deno).
https://github.com/agoodway/pgflow
(pg_durable committer here)
The provider is an extensibility point. We just shipped the simplest version of it. Happy to take contribs if someone sends a pgmq based provider!
Cool! I maintain https://postgresisenough.dev, I'd love to get a PR for pg_durable up to include it: https://github.com/agoodway/postgresisenough