I have two different queries. But they are both on the same table and have both the same WHERE
clause. So they are selecting the same row.
Query 1:
SELECT HOUR(timestamp), COUNT(*) as hits
FROM hits_table
WHERE timestamp >= CURDATE()
GROUP BY HOUR(timestamp)
Query 2:
SELECT country, COUNT(*) as hits
FROM hits_table
WHERE timestamp >= CURDATE()
GROUP BY country
How can I make this more efficient?
If this table is indexed correctly, it honestly doesn't matter how big the entire table is because you're only looking at today's rows.
If the table is indexed incorrectly the performance of these queries will be terrible no matter what you do.
Your WHERE timestamp >= CURDATE()
clause means you need to have an index on the timestamp
column. In one of your queries the GROUP BY country
shows that a compound covering index on (timestamp, country)
will be a great help.
So, a single compound index (timestamp, country)
will satisfy both the queries in your question.
Let's explain how that works. To look for today's records (or indeed any records starting and ending with particular timestamp
values) and group them by country, and count them, MySQL can satisfy the query by doing these steps:
timestamp
. O(log n). country
value from the index . country
value in the index and count. O(n). timestamp
range. This index scan operation is about as fast as a team of ace developers (the MySQL team) can get it to be with a decade of hard work. (You may not be able to outdo them on a Saturday afternoon.) MySQL satisfies the whole query with a small subset of the index, so it doesn't really matter how big the table behind it is.
If you run one of these queries right after the other, it's possible that MySQL will still have some or all the index data blocks in a RAM cache, so it might not have to re-fetch them from disk. That will help even more.
Do you see how your example queries lead with timestamp
? The most important WHERE
criterion chooses a timestamp range. That's why the compound index I suggested has timestamp
as its first column. If you don't have any queries that lead with country
your simple index on that column probably is useless.
You asked whether you really need compound covering indexes. You probably should read about how they work and make that decision for yourself.
There's obviously a tradeoff in choosing indexes. Each index slows the process of INSERT
and UPDATE
a little, and can speed up queries a lot. Only you can sort out the tradeoffs for your particular application.
Since both queries have different GROUP BY
clauses they are inherently different and cannot be combined. Assuming there already is an index present on the timestamp
field there is no straightforward way to make this more efficient.
If the dataset is huge (10 million or more rows) you might get a little extra efficiency out of making an extra combined index on country, timestamp
, but that's unlikely to be measurable, and the lack of it will usually be mitigated by in-memory buffering of MySQL itself if these 2 queries are executed directly after another.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.