简体   繁体   中英

Performance of MySQL UNION LIMITs

I've got two distinct MySQL tables that both contain time-series data (in that both contain a 'timestamp' column). With the exception of the 'timestamp' column, the two tables do not share any common features besides a 'client_id' column.

table_a
- id
- client_id
- timestamp
- ...

table_b
- id
- client_id
- timestamp
- ...

Both tables are indexed on (client_id, timestamp).

I'm trying to combine these two tables into a single paginated time-series. To be concrete, by this I mean that I'm trying to load N records (with an offset of M) from the union of table_a and table_b , ordered by timestamp .

I've tried to do this with a statement like this:

(SELECT 'a', id FROM table_a WHERE client_id=1) UNION (SELECT 'b', id FROM table_b WHERE client_id=1) ORDER BY timestamp LIMIT 100;

Unfortunately, the resulting query seems to be grabbing all matching rows from both tables, combining, and then applying the LIMIT.

Note that queries against the individual tables are super fast:

SELECT 'a', id FROM table_a WHERE client_id=1 ORDER by timestamp LIMIT 100

Is there a better way to index the tables or write the UNION query?

You can't make good use of indexes here because there are two separate indexes in two separate tables.

You could add a limit to each individual select, but this will only work for the first page, and not for offsets.

If you are willing to drop the requirement that a "page" is limit plus offset, you can paginate some other absolute (rather than relative) way, such as by day. Eg:

(SELECT 'a', id, timestamp FROM table_a WHERE client_id=1 AND timestamp BETWEEN '2014-04-18 00:00:00' AND '2014-04-18 23:59:59')
UNION
(SELECT 'b', id, timestamp FROM table_b WHERE client_id=1 AND timestamp BETWEEN '2014-04-18 00:00:00' AND '2014-04-18 23:59:59')
ORDER BY timestamp;

However, it is also possible that your data is not fully normalized and that the common attributes from table_a and table_b should be in a third table. This pattern is called "joined table inheritance".

For example:

table_common
- id
- type ('a' or 'b')
- client_id
- timestamp
- primary key: (id, type) if id is not unique.
- index: (client_id, timestamp)

table_a
- id (same value as in table_common)
...

table_b
- id (same value as in table_common)
...

Since you are now sharing a common index, you can do the following:

SELECT id, type, timestamp FROM table_common
WHERE client_id=1 ORDER BY timestamp LIMIT 100;

If you need more fields from the child tables, use LEFT OUTER JOIN and include type in the condition:

SELECT * FROM table_common
LEFT OUTER JOIN table_a ON table_common.type='a' AND table_common.id=table_a.id
LEFT OUTER JOIN table_b ON table_common.type='b' AND table_common.id=table_b.id
ORDER BY timestamp LIMIT 100;

One easy way to do this is to apply the same LIMIT to each of the individual queries, since by definition you will never need more than N rows from any of the individual queries:

(SELECT 'a', id FROM table_a WHERE client_id=1 ORDER BY timestamp LIMIT 100) 
UNION 
(SELECT 'b', id FROM table_b WHERE client_id=1 ORDER BY timestamp LIMIT 100) 
ORDER BY timestamp 
LIMIT 100;

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM