Optimizers: The Low-Key MVP

(duckdb.org)

73 points | by tosh 10 hours ago ago

4 comments

adrian17 6 hours ago

> The SQL above results in a plan similar to the DuckDB optimized plan, but it is wordier and more error-prone to write, which can potentially lead to bugs.

FWIW, aside from manual filter pushdown, I consider the JOIN variant the canonical / "default" way to merge multiple tables; it keeps all the join-related logic in one place, while mixing both joining conditions and filtering conditions in WHERE always felt more error-prone to me.

[-]

Sesse__ 6 hours ago

It is also the only way to represent join conditions for outer joins.

ikesau 6 hours ago

> This means your optimizations need to be applied by hand, which is sustainable if your data starts changing.

Seems like a missing "un" here

Compelling article! I've already found DuckDB to be the most ergonomic tool for quick and dirty wrangling, it's good to know it can handle massive jobs too.

[-]

Nihilartikel 3 hours ago

I regularly use duckdb on datasets of 1B+ rows, with nasty strong columns that may be over 10MB per value in the outliers. Mostly it just works, and fast too! When it doesn't, I'll usually just dump to parquet and hit it with sparksql, but that is the exception rather than the rule.