Waiting for SQL:202y: Group by All

(peter.eisentraut.org)

59 points | by ingve 5 months ago ago

40 comments

  • elygre 5 months ago

    Let me reference fields as I create them:

      select xxxxx as a
           , a * 2 as b
    • zX41ZdbW 5 months ago

      This will be great! One of the things ClickHouse has had since 2016.

      • 5 months ago
        [deleted]
    • cyberax 5 months ago

      SQL needs to have `select` as the _last_ part, not the first. LINQ has had this for 2 decades by now: "from table_a as a, table_b as b where ... select a.blah, b.duh".

      • cryptonector 5 months ago

        This is not relevant to GP's point. This is a separate topic, which... I don't really care, but I know a lot of people want to be able to write SQL as you suggest, and it's not hard to implement, so, sure.

        Though, I think it might have to be table sources, then `SELECT`, then `WHERE`, then ... because you might want to refer to output columns in the `WHERE` clause.

        • snuxoll 5 months ago

          WHERE clauses are pushed down into the query planner before the SELECT list is processed, that’s why HAVING exists.

          The logical order, in full, is:

          FROM

          WHERE/JOIN (you can join using WHERE clauses and do FROM a,b still)

          SELECT

          HAVING

          • 1718627440 4 months ago

            That's the order in which the processing happens, but this doesn't need to be reflected in the language. The language has this ordering so it sounds like a natural language which SQL was invented for.

          • cryptonector 4 months ago

            See u/cyberax's comment below. It would be nice to be able to create scalar (as opposed to table-valued) bindings that can be referred to in a WHERE (or JOIN) clause. Currently it's SELECT that establishes such bindings, and... well, it's not terribly clear where they can be used (certainly in HAVING, but first you have to GROUP BY, no?). u/cyberax's idea is to have a LET for this that can come before WHERE and before SELECT.

            • snuxoll 4 months ago

              I mean, I get it, but the big problem is, again, the different phases of execution. The projections you perform with a select can be absolutely arbitrary and do crazy ass things (like do more subqueries that return scalar values, and query planners are notoriously bad at pushing these down), which is why I was trying to say SELECT before WHERE (project before filtering) may be linguistically intuitive, but full of foot guns.

              Something like a ‘let’ binding after the FROM/JOIN list would make sense, though - from the query planners perspective it’s nothing more than a token substitution and everything would compile the same.

        • cyberax 5 months ago

          Ideally, it needs to be "from", then arbitrary number of something like `let` statements that can introduce new variables, maybe interspersed with where-s, and then finally "select".

          "select" can also be replaced with annotations, something like: `from table_1 t1 let t1.column_1 as @output_1 where ...` and then just collect all the @-annotated variables.

          I need to write a lot of SQL, and it's so clumsy. Every time I need a CTE, I have to look into the documentation for the exact syntax.

          • 1718627440 4 months ago

            > Ideally, it needs to be "from", then arbitrary number of something like `let` statements

            Isn't that what a CTE is?

            • cryptonector 4 months ago

              Not quite. u/cyberax wants scalar bindings, not table-valued bindings.

              Something like

                FROM foo
                LET a = (x + y) * z
                SELECT a;
              
              whereas CTEs are... Common Table Expressions.
            • tracker1 4 months ago

              That was kind of my first thought...

      • viraptor 4 months ago

        https://prql-lang.org/ and compile to SQL.

        • cyberax 4 months ago

          Thank you! This is indeed close to what I want from SQL!

      • agnosticmantis 5 months ago

        The Pipe Query Syntax in GoogleSQL implements this elegantly as well:

        https://docs.cloud.google.com/bigquery/docs/reference/standa...

      • jiggawatts 4 months ago

        Also in the Kusto Query Language (KQL) as used by Azure Log Analytics.

  • Exuma 5 months ago

    Also just let me reference the damn alias in a group by, FUCK

    • sbuttgereit 5 months ago

      At least in PostgreSQL, both by alias and ordinal are possible:

        localhost(from SCB-MUSE-BOXX).postgres.scb.5432 [Sun Nov 16 12:02:15 PST 2025]
        > create table test (a_key integer primary key, a_group integer, a_val numeric);
        CREATE TABLE
        Time: 3.102 ms
      
        localhost(from SCB-MUSE-BOXX).postgres.scb.5432 [Sun Nov 16 12:02:25 PST 2025]
        > insert into test (a_key, a_group, a_val) values (1, 1, 5.5), (2, 1, 2.6), (3, 2, 1.1), (4, 2, 6.5);
        INSERT 0 4
        Time: 2.302 ms
      
        localhost(from SCB-MUSE-BOXX).postgres.scb.5432 [Sun Nov 16 12:02:58 PST 2025]
        > select a_group AS my_group, sum(a_val) from test group by my_group;
         my_group | sum
        ----------+-----
                2 | 7.6
                1 | 8.1
        (2 rows)
        
        Time: 4.124 ms
        localhost(from SCB-MUSE-BOXX).postgres.scb.5432 [Sun Nov 16 12:03:15 PST 2025]
        > select a_group AS my_group, sum(a_val) from test group by 1;
         my_group | sum
        ----------+-----
                2 | 7.6
                1 | 8.1
        (2 rows)
        
        Time: 0.360 ms
    • mberning 5 months ago

      Some do. It would also be nice to reference by ordinal number similar to order by. Very handy for quick and dirty queries. I can see the issue though that people start to lean on it too much.

    • petereisentraut 4 months ago

      The problem with this and similar requests is that it would change the identifier scoping in incompatible ways and therefore potentially break a lot of existing SQL code.

    • zX41ZdbW 5 months ago

      I think it should be not only in GROUP BY, but in every context, e.g., inside expressions in SELECT, WHERE, etc.

    • kermatt 5 months ago

      PostgreSQL and DuckDB support this, which makes MSSQL feel like a dinosaur in context.

  • theodpHN 5 months ago

    So, why not a SORT BY ALL or a GROUPSORT BY ALL, too? Not always what you want (e.g., when you're ranking on a summarized column), but it often alphabetic order on the GROUP BY columns is just what the doctor ordered! :-)

    • petereisentraut 4 months ago

      The working group also discussed ORDER BY ALL, but for some reason most participants really did not like it.

  • oulipo2 5 months ago

    Not directly related, but I saw this project recently of a data language by google which is quite cool https://www.malloydata.dev/

  • cm2187 5 months ago

    Snowflake has that, once you start using it, it's painful to go back.

  • sixtram 5 months ago

    What about reusing a CTE? Let me import a CTE definition so that it can be used throughout my app, not just in the current context.

    • jklowden 4 months ago

      I believe that’s what we call a "view".

  • chewxy 5 months ago

    BigQuery has that and I've been loving using it since they introduced it

  • elchief 5 months ago
  • parpfish 5 months ago

    this seems to ignore the fact that you can group by a column that isn't in the select statement.

    it's not something that i've found a particular use for, but it IS a thing you can do.

  • Inviz 5 months ago

    What's wrong with GROUP BY 1,2,3?

  • SigmundA 5 months ago

    SELECT * EXCEPT(col_name) next please.

  • dorianmariecom 5 months ago

    would be nice

  • wvbdmp 5 months ago

    What? No! I want GROUP BY * and more importantly GROUP BY mytable.*