INNER JOIN ON vs WHERE clause

(stackoverflow.com)

36 points | by thunderbong 3 hours ago ago

18 comments

  • santiagobasulto 2 hours ago

    Crazy that there are no in-depth answers with some EXPLAINs and profiling. Just mentions to ANSI standards.

    EDIT: I realized I completely misread the question. I thought it was asking the difference of using a WHERE clause or putting conditionals in the JOIN (see my reply above for an example). The original SO questions is about an old-school CROSS JOIN `table1,table2` (which is a cartesian product in relational algebra) and a JOIN.

    Edit 2: "old school" in the sense of joining tables before JOINs existed. I think it was System R the first engine to introduce efficient JOINs vs cross products.

    • santiagobasulto 2 hours ago

      I did a quick test in Postgres using the sample Airlines database.

      Here are the two tested queries:

      Query 1:

          SELECT 
              t.passenger_name, 
              t.ticket_no, 
              bp.seat_no
          FROM 
              Flights f
          JOIN 
              Ticket_flights tf ON f.flight_id = tf.flight_id
          JOIN 
              Tickets t ON tf.ticket_no = t.ticket_no
          JOIN 
              Boarding_passes bp ON t.ticket_no = bp.ticket_no AND tf.flight_id = bp.flight_id
          WHERE 
              f.arrival_airport = 'OVB';
      
      Query 2:

          SELECT 
              t.passenger_name, 
              t.ticket_no, 
              bp.seat_no
          FROM 
              Flights f
          JOIN 
              Ticket_flights tf ON (f.flight_id = tf.flight_id AND f.arrival_airport = 'OVB')
          JOIN 
              Tickets t ON tf.ticket_no = t.ticket_no
          JOIN 
              Boarding_passes bp ON t.ticket_no = bp.ticket_no AND tf.flight_id = bp.flight_id
      
      Then I ran EXPLAIN for both of them and the query plan is THE same. So there's not a big difference at least in Postgres.

      Here's the GPT conversation: https://i.imgur.com/dIzcfnc.jpeg

      It doesn't let me share it because it contains an image

      • serpix 34 minutes ago

        Both examples are (to my delight) using aliased table names for all columns which is already a major step up in readability.

  • shrx 2 hours ago

    I disagree with the apparently more popular notion that INNER JOIN is more readable. Sure, it's more verbose, but that doesn't make it more readable.

    • yen223 9 minutes ago

      It's been a very long time since I've seen a query that uses the `table1, table2` cross-join syntax.

      I'd be curious to know how many SQL people nowadays know what that does.

    • stravant 2 hours ago

      The join version is also certainly less readable for anyone who doesn't normally work with databases and is just dipping into the database handling code for some reason.

    • forinti an hour ago

      I think it matters specially with big queries. When you have dozens of tables, it really helps to separate the joins from the where clause.

    • qsort 2 hours ago

      I think it's better at specifying intent, similarly to how you would use "for" and "while" in a programming language even though they are literally the same thing and more often than not they compile to, respectively, identical query plans and identical asm/bytecode.

      Also if you work a lot with databases you often need to do outer joins, a full cartesian product is almost never what you want. The join syntax is more practical if you need to change what type of join you are performing, especially in big queries.

    • ndepoel 27 minutes ago

      It's all about clearly stating your intent. With INNER JOIN you're literally saying "I want to join these two tables together on this particular relation and work on the result", while with the more basic WHERE form you're saying "just lump these two tables together and then we'll filter out the rows that we actually want to see". The join becomes more of a happy side-effect with that, rather than the thing you clearly want to do.

      Not only does writing your code in such a way that it states your intent make it easier to read for other humans, it also makes it easier for compilers/query planners to understand what you're trying to do and turn it into a more efficient process at run-time. Now query planners are usually pretty good at distilling joins from WHERE clauses, but that form does also make it easier for mistakes to creep in that can murder your query performance in subtle and hard-to-debug ways.

    • chasil an hour ago

      I find it to be especially more readable when I can use JOIN ... USING().

      This is assuming that your SQL variant supports it.

  • icsrutil 2 hours ago

    From my understanding, those two are different. the result maybe the same, but the the dataset is different after the FROM is executed.

    > ON table1.foreignkey = table2.primarykey

    The calculated dataset is the result, after the ON clause.

    • skeeter2020 13 minutes ago

      I've only looked at the execution plan in SQL server, but they're the same. Because of the way they're set up you'll get an index seek and an index scan when using the ANSI syntax or the older style. The WHERE clause join does not apply a bunch of filtering on an intermediate state.

  • yyx an hour ago

    Wonder what's the difference in PRQL...

  • iblaine an hour ago

    Is there a difference between implicit and explicit joins? No, but use an explain plan to be sure.

    (Saved you a click)

  • a2800276 3 hours ago

    TLDR: they're the same

    • red_admiral 2 hours ago

      With a good enough query planner and optimizer, yes. I'm not sure if that was always the case. I can imagine historically if you were joining on fields that were only indexed in one table and not the other - I'm not saying this is an intelligent thing to do, but it sometimes happens - then controlling the order of the join yourself if the database didn't optimize this for you would be important.

      • skeeter2020 9 minutes ago

        You're setting up a scenario where you'd be in trouble regardless though; controlling the join order would be a local optimization and query would still probably have a table scan, while appropriately indexing would solve any query planner mistakes and give you a more efficient execution in both styles.

    • coretx 28 minutes ago

      But we still don't know if it's more expensive or not... I'm hitting F5 on this page hoping to find out.