简体   繁体   中英

MySQL UNION ALL vs muliple SELECT performance on large databses

I have 3 large databases, and I need to run the same query on all 3 of them. The query looks like this:

SELECT table1.a, table1.b, table2.a, table2.c 
FROM databse_A.table1 
INNER JOIN databse_A.table2 ON table1.a = table2.a

I decided to use UNION ALL to combine results from multiple databases so the query in the end looked like this:

SELECT table1.a, table1.b, table2.a, table2.c 
FROM databse_A.table1 
INNER JOIN databse_A.table2 ON table1.a = table2.a 
UNION ALL 
SELECT table1.a, table1.b, table2.a, table2.c 
FROM databse_B.table1 
INNER JOIN databse_B.table2 ON table1.a = table2.a 
UNION ALL 
SELECT table1.a, table1.b, table2.a, table2.c 
FROM databse_C.table1 
INNER JOIN databse_C.table2 ON table1.a = table2.a

The above query took 0.0068 to execute and returned around 3000 rows. Then I decided to test the same without UNION ALL . I cleared the database cache and run every SELECT as a separate query (every query returned around 1000 rows) and in the end that 3 queries took 0.0023 (in sum) to execute.

When the database cached the queries times changed from 0.0068 to 0.0055 and from 0.0023 to 0.0013.

So my question is why is there almost 3x difference between queries that do the same thing in the end. Do returned number of rows have anything to do with this?

If yes, is it better to have multiple queries that return a smaller number of rows than one big query that returns a large number of rows?

It depends.

First of all, under 10ms is so small in MySQL queries that it is hardly worth debating or comparing.

It used to be that all UNIONs would create a temp table and collect the data into it from each SELECT . Then it delivers the rows from the temp table to the client. That says that the UNION might be slower.

What version do you have? The latest versions of MySQL have an optimization that would help your test case (but not all cases of UNION -- it will dispense with the temp table, and deliver rows from one SELECT at a time directly to the client.

There is a non-trivial overhead for each separate SQL you send to the server. The UNION is 1 SQL; the 3 separate SELECTs is 3. That says that the UNION might be faster, especially for newer versions.

Also, because of the overheads mentioned above, 1000 rows (small) may not be representative of what will happen with 1000000 rows (medium) or 1 billion (huge). (I don't know what you consider "large", but 1000 is definitely not "large".)

Bear in mind, also, that a significant part of the 1.3ms is the time it takes to send the data back to the client.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM