Better JIT for Postgres

(github.com)

58 points | by vladich 4 hours ago ago

14 comments

  • swaminarayan 21 minutes ago

    Have you tested this under high concurrency with lots of short OLTP queries? I’m curious whether the much faster compile time actually moves the point where JIT starts paying off, or if it’s still mostly useful for heavier queries.

  • eru 3 hours ago

    > However, standard LLVM-based JIT is notoriously slow at compilation. When it takes tens to hundreds of milliseconds, it may be suitable only for very heavy, OLAP-style queries, in some cases.

    I don't know anything here, but this seems like a good case for ahead of time compilation? Or at least caching your JIT results? I can image much of the time, you are getting more or less the same query again and again?

    • bob1029 an hour ago

      At some level the application needs to participate in the performance conversation too.

      https://www.postgresql.org/docs/current/sql-prepare.html

    • olau 2 hours ago

      Yes.

      Some years ago we ported some code from querying out the data and tallying in Python (how many are in each bucket) to using SQL to do that. It didn't speed up the execution. I was surprised by that, but I guess the Postgres interpreter is roughly the same speed as Python, which when you think about it perhaps isn't that surprising.

      But Python is truly general purpose while the core query stuff in SQL is really specialized (we were not using stored procedures). So if Pypy can get 5x speedup, it seems to me that it should be possible to get the same kind of speed up in Postgres. I guess it needs funding and someone as smart as the Pypy people.

  • sourcegrift an hour ago

    We have everything optimized, and yet somehow DB queries need to be "interpreted" at runtime. There's no reason for DB queries to not be precompiled.

  • fabian2k 2 hours ago

    The last time I looked into it my impression was that disabling the JIT in PostgreSQL was the better default choice. I had a massive slowdown in some queries, and that doesn't seem to be an entirely unusual experience. It does not seem worth it to me to add such a large variability to query performance by default. The JIT seemed like something that could be useful if you benchmark the effect on your actual queries, but not as a default for everyone.

    • pjmlp 2 hours ago

      That is quite strange, given that big boys RDMS (Oracle, SQL Server, DB2, Informix,...) all have JIT capabilities for several decades now.

      • SigmundA a few seconds ago

        The big boys all cache query plans so the amount it time it take to compile is not really a concern.

  • larodi 37 minutes ago

    sadly, no windows version yet AFAICT

  • asah 3 hours ago

    awesome! I wonder if it's possible to point AI at this problem and synthesize a bespoke compiler (per-architecture?) for postgresql expressions?

    • kvdveer 3 hours ago

      Two things are holding back current LLM-style AI of being of value here:

      * Latency. LLM responses are measured in order of 1000s of milliseconds, where this project targets 10s of milliseconds, that's off by almost two orders of magnitute.

      * Determinism. LLMs are inherently non-deterministic. Even with temperature=0, slight variations of the input lead to major changes in output. You really don't want your DB to be non-deterministic, ever.

      • qeternity an hour ago

        > LLMs are inherently non-deterministic.

        This isn't true, and certainly not inherently so.

        Changes to input leading to changes in output does not violate determinism.

      • simonask 3 hours ago

        > 1000s of milliseconds

        Better known as "seconds"...

      • olau 3 hours ago

        The suggestion was not to use an LLM to compile the expression, but to use an LLM to build the compiler.