Waiting for SQL:202y: Group by All

(peter.eisentraut.org)

59 points | by ingve 5 months ago ago

40 comments

elygre 5 months ago

Let me reference fields as I create them:

  select xxxxx as a
       , a * 2 as b

[-]

zX41ZdbW 5 months ago

This will be great! One of the things ClickHouse has had since 2016.

[-]

5 months ago

[deleted]

cyberax 5 months ago

SQL needs to have `select` as the _last_ part, not the first. LINQ has had this for 2 decades by now: "from table_a as a, table_b as b where ... select a.blah, b.duh".

[-]

cryptonector 5 months ago

This is not relevant to GP's point. This is a separate topic, which... I don't really care, but I know a lot of people want to be able to write SQL as you suggest, and it's not hard to implement, so, sure.

Though, I think it might have to be table sources, then `SELECT`, then `WHERE`, then ... because you might want to refer to output columns in the `WHERE` clause.

[-]

snuxoll 5 months ago

WHERE clauses are pushed down into the query planner before the SELECT list is processed, that’s why HAVING exists.

The logical order, in full, is:

FROM

WHERE/JOIN (you can join using WHERE clauses and do FROM a,b still)

SELECT

HAVING

[-]

1718627440 4 months ago

That's the order in which the processing happens, but this doesn't need to be reflected in the language. The language has this ordering so it sounds like a natural language which SQL was invented for.

cryptonector 4 months ago

See u/cyberax's comment below. It would be nice to be able to create scalar (as opposed to table-valued) bindings that can be referred to in a WHERE (or JOIN) clause. Currently it's SELECT that establishes such bindings, and... well, it's not terribly clear where they can be used (certainly in HAVING, but first you have to GROUP BY, no?). u/cyberax's idea is to have a LET for this that can come before WHERE and before SELECT.

[-]

snuxoll 4 months ago

I mean, I get it, but the big problem is, again, the different phases of execution. The projections you perform with a select can be absolutely arbitrary and do crazy ass things (like do more subqueries that return scalar values, and query planners are notoriously bad at pushing these down), which is why I was trying to say SELECT before WHERE (project before filtering) may be linguistically intuitive, but full of foot guns.

Something like a ‘let’ binding after the FROM/JOIN list would make sense, though - from the query planners perspective it’s nothing more than a token substitution and everything would compile the same.

cyberax 5 months ago

Ideally, it needs to be "from", then arbitrary number of something like `let` statements that can introduce new variables, maybe interspersed with where-s, and then finally "select".

"select" can also be replaced with annotations, something like: `from table_1 t1 let t1.column_1 as @output_1 where ...` and then just collect all the @-annotated variables.

I need to write a lot of SQL, and it's so clumsy. Every time I need a CTE, I have to look into the documentation for the exact syntax.

[-]

1718627440 4 months ago

> Ideally, it needs to be "from", then arbitrary number of something like `let` statements

Isn't that what a CTE is?

[-]

cryptonector 4 months ago

Not quite. u/cyberax wants scalar bindings, not table-valued bindings.

Something like

  FROM foo
  LET a = (x + y) * z
  SELECT a;

whereas CTEs are... Common Table Expressions.

tracker1 4 months ago

That was kind of my first thought...

viraptor 4 months ago

https://prql-lang.org/ and compile to SQL.

[-]

cyberax 4 months ago

Thank you! This is indeed close to what I want from SQL!

agnosticmantis 5 months ago

The Pipe Query Syntax in GoogleSQL implements this elegantly as well:

https://docs.cloud.google.com/bigquery/docs/reference/standa...

jiggawatts 4 months ago

Also in the Kusto Query Language (KQL) as used by Azure Log Analytics.

Exuma 5 months ago

Also just let me reference the damn alias in a group by, FUCK

[-]

sbuttgereit 5 months ago

At least in PostgreSQL, both by alias and ordinal are possible:

  localhost(from SCB-MUSE-BOXX).postgres.scb.5432 [Sun Nov 16 12:02:15 PST 2025]
  > create table test (a_key integer primary key, a_group integer, a_val numeric);
  CREATE TABLE
  Time: 3.102 ms

  localhost(from SCB-MUSE-BOXX).postgres.scb.5432 [Sun Nov 16 12:02:25 PST 2025]
  > insert into test (a_key, a_group, a_val) values (1, 1, 5.5), (2, 1, 2.6), (3, 2, 1.1), (4, 2, 6.5);
  INSERT 0 4
  Time: 2.302 ms

  localhost(from SCB-MUSE-BOXX).postgres.scb.5432 [Sun Nov 16 12:02:58 PST 2025]
  > select a_group AS my_group, sum(a_val) from test group by my_group;
   my_group | sum
  ----------+-----
          2 | 7.6
          1 | 8.1
  (2 rows)
  
  Time: 4.124 ms
  localhost(from SCB-MUSE-BOXX).postgres.scb.5432 [Sun Nov 16 12:03:15 PST 2025]
  > select a_group AS my_group, sum(a_val) from test group by 1;
   my_group | sum
  ----------+-----
          2 | 7.6
          1 | 8.1
  (2 rows)
  
  Time: 0.360 ms

mberning 5 months ago

Some do. It would also be nice to reference by ordinal number similar to order by. Very handy for quick and dirty queries. I can see the issue though that people start to lean on it too much.

petereisentraut 4 months ago

The problem with this and similar requests is that it would change the identifier scoping in incompatible ways and therefore potentially break a lot of existing SQL code.

zX41ZdbW 5 months ago

I think it should be not only in GROUP BY, but in every context, e.g., inside expressions in SELECT, WHERE, etc.

kermatt 5 months ago

PostgreSQL and DuckDB support this, which makes MSSQL feel like a dinosaur in context.

theodpHN 5 months ago

So, why not a SORT BY ALL or a GROUPSORT BY ALL, too? Not always what you want (e.g., when you're ranking on a summarized column), but it often alphabetic order on the GROUP BY columns is just what the doctor ordered! :-)

[-]

petereisentraut 4 months ago

The working group also discussed ORDER BY ALL, but for some reason most participants really did not like it.

oulipo2 5 months ago

Not directly related, but I saw this project recently of a data language by google which is quite cool https://www.malloydata.dev/

cm2187 5 months ago

Snowflake has that, once you start using it, it's painful to go back.

sixtram 5 months ago

What about reusing a CTE? Let me import a CTE definition so that it can be used throughout my app, not just in the current context.

[-]

jklowden 4 months ago

I believe that’s what we call a "view".

chewxy 5 months ago

BigQuery has that and I've been loving using it since they introduced it

elchief 5 months ago

duckdb has it

https://duckdb.org/docs/stable/sql/query_syntax/groupby

parpfish 5 months ago

this seems to ignore the fact that you can group by a column that isn't in the select statement.

it's not something that i've found a particular use for, but it IS a thing you can do.

Inviz 5 months ago

What's wrong with GROUP BY 1,2,3?

SigmundA 5 months ago

SELECT * EXCEPT(col_name) next please.

[-]

petereisentraut 4 months ago

This was also discussed at the last SQL WG meeting but was postponed for further refinement. But it’s likely to be added soon.

azurezyq 5 months ago

BigQuery has it! https://docs.cloud.google.com/bigquery/docs/reference/standa...

[-]

SigmundA 5 months ago

Yes it needs to be in the standard though.

1718627440 4 months ago

That might be nice for manual experimentation, but for application use, this seems brittle compared to specifying the columns you really want to have and process.

dorianmariecom 5 months ago

would be nice

wvbdmp 5 months ago

What? No! I want GROUP BY * and more importantly GROUP BY mytable.*