Looking Forward to Postgres 19: It's About Time

(pgedge.com)

37 points | by xngbuilds 2 hours ago ago

12 comments

  • larsnystrom 17 minutes ago

    Why are they storing a time period (start and end date) in the first example? Why not just store the date when the price comes into effect? That would make both overlaps and time travel impossible without using any constraints.

    • throwaway7783 13 minutes ago

      Works when there is always an active price. Having an explicit end date allows certain rows to be inactive automatically after validity period. Think of seasonal categories/products etc which dont exist after a specific period

      • larsnystrom 2 minutes ago

        You could also make the price column nullable and just insert a row with price null and the date from which there should be no price.

    • quotemstr 13 minutes ago

      It's a trade-off. If you store both endpoints (as Postgres does) you can continue to think of rows as order-invariant tuples. If you store only one endpoint (as my thing does --- see other comment) you have to impose a meaningful order on the rows in order for them to make sense.

      • larsnystrom a few seconds ago

        Sure, from a theoretical perspective, but in practice there’s got to be some sort of order at some point even when storing timespans.

  • evdubs 17 minutes ago

    > Recently, a new type of question has entered the database arena: what did this data look like last Tuesday?

    This question has been answerable in Dolt for years now.

    • quotemstr 11 minutes ago

      This kind of DB isn't really answering this question. There's a lot more subtlety to time-span analysis than snapshotting. In particular, aligning two series is non-obvious.

      Say you have one time series with CPU-core task switches:

      T=1 task=A T=3, task=B, T=5 task=A, ...

      ... and another of CPU frequency changes ...

      T=2 freq_hz=800, T=5 freq_hz=1200, T=6 freq_hz=900

      How, in SQL, do you express the question "How many CPU cycles did each task use?"? Try to do it with more complex examples. You'll tear your hair out.

      Having worked on this sort of data analysis quite a lot, I'm strongly of the opinion that SQL needs syntax, not just table-valued functions, for expressing questions about timelines.

      • evdubs 7 minutes ago

        The cool thing about Dolt is that you [eventually] get the features of the databases (MySQL, PostgreSQL, SQLite, MongoDB) they emulate, so you can have your PG 19 temporality features as well as branching and merging.

        • quotemstr 6 minutes ago

          Yep. I'm just pointing out that the problems Dolt solves are different from the problems a timeline-aware SQL algebra solves.

  • munk-a 34 minutes ago

    It can be super ugly to try and hand-manage date time range manipulation in a system with moving parts. It is, of course, possible, but it's a headache to try and pull it off when there is complex business logic focused on those datetimes and interactions between ranges. The period constraint is an excellent tool for trivially guaranteeing range coverage (in a case where, for instance, a customer is known to be active from a to b it helps ensure that there aren't any gaps created during the juggling of different sub-ranges) while the new DELETE FOR PERIOD OF syntax makes manipulating swathes of history (I don't care what was happening here this interval should now be X) much more trivial than before.

  • bhaak 24 minutes ago

    I remember reading about this feature for Oracle in the 2000s and was always interested to use it in a production environment.

    It never came to pass when we used Oracle, maybe now with Postgres I will finally have a chance at it.

  • quotemstr 15 minutes ago

    Great! I've been wanting native time-based tables for ages. Years and years ago, I even wrote my own DB engine to get them!

    See https://dancol.org/dctv/index.xml

    I still think there's a lot of value in first-class syntax for time queries. Splitting ranges like Postgres 19 does is a good first step, but there's also a lot of power in broadcasting over these ranges, combining them in various ways, and storing multiple, independent ranges in a single table.

    Ignore the bit about active development: these days, it'd take more sense to add the operators I describe to Postgres and DuckDB than to make a numpy-based engine just to host the analysis. This work predates DuckDB, and it's reassuring that DuckDB (and now Postgres) are thinking along similar lines.

    I'm also glad that in the intervening years "data lake"-style analysis has become more prominent. My ideal data processing pipeline consists of sourcing from raw data and pipelining views all the way to human-meaningful outputs. Materialization, if it occurs, is just an optimization.