Crazy that there are no in-depth answers with some EXPLAINs and profiling. Just mentions to ANSI standards.
EDIT: I realized I completely misread the question. I thought it was asking the difference of using a WHERE clause or putting conditionals in the JOIN (see my reply above for an example). The original SO questions is about an old-school CROSS JOIN `table1,table2` (which is a cartesian product in relational algebra) and a JOIN.
Edit 2: "old school" in the sense of joining tables before JOINs existed. I think it was System R the first engine to introduce efficient JOINs vs cross products.
I did a quick test in Postgres using the sample Airlines database.
Here are the two tested queries:
Query 1:
SELECT
t.passenger_name,
t.ticket_no,
bp.seat_no
FROM
Flights f
JOIN
Ticket_flights tf ON f.flight_id = tf.flight_id
JOIN
Tickets t ON tf.ticket_no = t.ticket_no
JOIN
Boarding_passes bp ON t.ticket_no = bp.ticket_no AND tf.flight_id = bp.flight_id
WHERE
f.arrival_airport = 'OVB';
Query 2:
SELECT
t.passenger_name,
t.ticket_no,
bp.seat_no
FROM
Flights f
JOIN
Ticket_flights tf ON (f.flight_id = tf.flight_id AND f.arrival_airport = 'OVB')
JOIN
Tickets t ON tf.ticket_no = t.ticket_no
JOIN
Boarding_passes bp ON t.ticket_no = bp.ticket_no AND tf.flight_id = bp.flight_id
Then I ran EXPLAIN for both of them and the query plan is THE same. So there's not a big difference at least in Postgres.
The join version is also certainly less readable for anyone who doesn't normally work with databases and is just dipping into the database handling code for some reason.
I think it's better at specifying intent, similarly to how you would use "for" and "while" in a programming language even though they are literally the same thing and more often than not they compile to, respectively, identical query plans and identical asm/bytecode.
Also if you work a lot with databases you often need to do outer joins, a full cartesian product is almost never what you want. The join syntax is more practical if you need to change what type of join you are performing, especially in big queries.
It's all about clearly stating your intent. With INNER JOIN you're literally saying "I want to join these two tables together on this particular relation and work on the result", while with the more basic WHERE form you're saying "just lump these two tables together and then we'll filter out the rows that we actually want to see". The join becomes more of a happy side-effect with that, rather than the thing you clearly want to do.
Not only does writing your code in such a way that it states your intent make it easier to read for other humans, it also makes it easier for compilers/query planners to understand what you're trying to do and turn it into a more efficient process at run-time. Now query planners are usually pretty good at distilling joins from WHERE clauses, but that form does also make it easier for mistakes to creep in that can murder your query performance in subtle and hard-to-debug ways.
I've only looked at the execution plan in SQL server, but they're the same. Because of the way they're set up you'll get an index seek and an index scan when using the ANSI syntax or the older style. The WHERE clause join does not apply a bunch of filtering on an intermediate state.
With a good enough query planner and optimizer, yes. I'm not sure if that was always the case. I can imagine historically if you were joining on fields that were only indexed in one table and not the other - I'm not saying this is an intelligent thing to do, but it sometimes happens - then controlling the order of the join yourself if the database didn't optimize this for you would be important.
You're setting up a scenario where you'd be in trouble regardless though; controlling the join order would be a local optimization and query would still probably have a table scan, while appropriately indexing would solve any query planner mistakes and give you a more efficient execution in both styles.
Crazy that there are no in-depth answers with some EXPLAINs and profiling. Just mentions to ANSI standards.
EDIT: I realized I completely misread the question. I thought it was asking the difference of using a WHERE clause or putting conditionals in the JOIN (see my reply above for an example). The original SO questions is about an old-school CROSS JOIN `table1,table2` (which is a cartesian product in relational algebra) and a JOIN.
Edit 2: "old school" in the sense of joining tables before JOINs existed. I think it was System R the first engine to introduce efficient JOINs vs cross products.
I did a quick test in Postgres using the sample Airlines database.
Here are the two tested queries:
Query 1:
Query 2: Then I ran EXPLAIN for both of them and the query plan is THE same. So there's not a big difference at least in Postgres.Here's the GPT conversation: https://i.imgur.com/dIzcfnc.jpeg
It doesn't let me share it because it contains an image
Both examples are (to my delight) using aliased table names for all columns which is already a major step up in readability.
I disagree with the apparently more popular notion that INNER JOIN is more readable. Sure, it's more verbose, but that doesn't make it more readable.
It's been a very long time since I've seen a query that uses the `table1, table2` cross-join syntax.
I'd be curious to know how many SQL people nowadays know what that does.
The join version is also certainly less readable for anyone who doesn't normally work with databases and is just dipping into the database handling code for some reason.
I think it matters specially with big queries. When you have dozens of tables, it really helps to separate the joins from the where clause.
I think it's better at specifying intent, similarly to how you would use "for" and "while" in a programming language even though they are literally the same thing and more often than not they compile to, respectively, identical query plans and identical asm/bytecode.
Also if you work a lot with databases you often need to do outer joins, a full cartesian product is almost never what you want. The join syntax is more practical if you need to change what type of join you are performing, especially in big queries.
It's all about clearly stating your intent. With INNER JOIN you're literally saying "I want to join these two tables together on this particular relation and work on the result", while with the more basic WHERE form you're saying "just lump these two tables together and then we'll filter out the rows that we actually want to see". The join becomes more of a happy side-effect with that, rather than the thing you clearly want to do.
Not only does writing your code in such a way that it states your intent make it easier to read for other humans, it also makes it easier for compilers/query planners to understand what you're trying to do and turn it into a more efficient process at run-time. Now query planners are usually pretty good at distilling joins from WHERE clauses, but that form does also make it easier for mistakes to creep in that can murder your query performance in subtle and hard-to-debug ways.
I find it to be especially more readable when I can use JOIN ... USING().
This is assuming that your SQL variant supports it.
From my understanding, those two are different. the result maybe the same, but the the dataset is different after the FROM is executed.
> ON table1.foreignkey = table2.primarykey
The calculated dataset is the result, after the ON clause.
I've only looked at the execution plan in SQL server, but they're the same. Because of the way they're set up you'll get an index seek and an index scan when using the ANSI syntax or the older style. The WHERE clause join does not apply a bunch of filtering on an intermediate state.
Wonder what's the difference in PRQL...
Is there a difference between implicit and explicit joins? No, but use an explain plan to be sure.
(Saved you a click)
TLDR: they're the same
With a good enough query planner and optimizer, yes. I'm not sure if that was always the case. I can imagine historically if you were joining on fields that were only indexed in one table and not the other - I'm not saying this is an intelligent thing to do, but it sometimes happens - then controlling the order of the join yourself if the database didn't optimize this for you would be important.
You're setting up a scenario where you'd be in trouble regardless though; controlling the join order would be a local optimization and query would still probably have a table scan, while appropriately indexing would solve any query planner mistakes and give you a more efficient execution in both styles.
But we still don't know if it's more expensive or not... I'm hitting F5 on this page hoping to find out.