简体   繁体   中英

JOIN or INNER SELECT with IN, which is faster?

I was wondering which is faster an INNER JOIN or INNER SELECT with IN?

select t1.* from test1 t1
inner join test2 t2 on t1.id = t2.id
where t2.id = 'blah'

OR

select t1.* from test1 t1
where t1.id IN (select t2.id from test2 t2 where t2.id = 'blah')

Assuming id is key, these queries mean the same thing, and a decent DBMS will execute them in the exact same way. Unfortunately MySQL doesn't, as can be seen by expanding the "View Execution Plan" link in this SQL Fiddle . Which one will be faster probably depends on the size of tables - if TABLE1 has very few rows, then IN has a chance for being faster, while JOIN will likely be faster in all other cases.

This is a peculiarity of MySQL's query optimizer. I've never seen Oracle , PostgreSQL or MS SQL Server execute such simple equivalent queries differently.

If you have to guess, INNER JOIN is likely to be more efficient than an IN (SELECT ...) , but that can vary from one query to another.

The EXPLAIN keyword is one of your best friends. Type EXPLAIN in front of your complete SELECT query and MySQL will give you some basic information about how it will execute the query. It'll tell you where it's using file sorts, where it's using indices you've created (and where it's ignoring them), and how many rows it will probably have to examine to fulfill the request.

If all else is equal, use the INNER JOIN mostly because it's more predictable and thus easier to understand to a new developer coming in. But of course if you see a real advantage to the IN (SELECT ...) form, use it!

Though you'd have to check the execution plan on whatever RDBS you're inquiring about, I would guess the inner join would be faster or at least the same. Perhaps someone will correct me if I'm wrong.

The nested select will most likely run the entire inner query anyway, and build a hash table of possible values from test2 . If that query returns a million rows, you've incurred the cost of loading that data into memory no matter what.

With the inner join, if test1 only has 2 rows, it will probably just do 2 index scans on test2 for the id values of each of those rows, and not have to load a million rows into memory.

It's also possible that a more modern database system can optimize the first scenario since it has statistics on each table, however at the very best case, the inner join would be the same.

In most of the cases JOIN is much faster than sub query but sub-query is more readable than JOIN.

RDBMS creates an execution plan against JOIN so it can be predict that what data should be loaded to be processed. This definitely saves time. On the other hand for the sub-query it run all the queries and load all their data to do the processing.

For more details please check this link.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM