Optimizing Top K in Postgres

(paradedb.com)

46 points | by philippemnoel a day ago ago

4 comments

jmgimeno 26 minutes ago

Maybe I'm wrong, but for this query:

SELECT * FROM benchmark_logs WHERE severity < 3 ORDER BY timestamp DESC LIMIT 10;

this index

CREATE INDEX ON benchmark_logs (severity, timestamp);

cannot be used as proposed: "Postgres can jump directly to the portion of the tree matching severity < 3 and then walk the timestamps in descending order to get the top K rows."

Postgres with this index can walk to a part of the tree with severity < 3, but timestamps are sorted only for the same severity.

davidelettieri an hour ago

The "But Wait, We Need Filters Too" paragraph mentions "US" filter which is introduced only later on.

[-]

GrayShade 37 minutes ago

And footnote 3 is unreferenced.

tacone 18 minutes ago

The issue here is the row based format. You simply can't filter on arbitrary columns with that. Either use an external warehouse or a columnar plug-in like Timescale.