Two different queries on the same table with the same WHERE clause

Question

I have two different queries. But they are both on the same table and have both the same WHERE clause. So they are selecting the same row.

Query 1:

SELECT HOUR(timestamp), COUNT(*) as hits 
FROM hits_table 
WHERE timestamp >= CURDATE() 
GROUP BY HOUR(timestamp)

Query 2:

SELECT country, COUNT(*) as hits 
FROM hits_table 
WHERE timestamp >= CURDATE()
GROUP BY country

How can I make this more efficient?

Answer 1

If this table is indexed correctly, it honestly doesn't matter how big the entire table is because you're only looking at today's rows.

If the table is indexed incorrectly the performance of these queries will be terrible no matter what you do.

Your WHERE timestamp >= CURDATE() clause means you need to have an index on the timestamp column. In one of your queries the GROUP BY country shows that a compound covering index on (timestamp, country) will be a great help.

So, a single compound index (timestamp, country) will satisfy both the queries in your question.

Let's explain how that works. To look for today's records (or indeed any records starting and ending with particular timestamp values) and group them by country, and count them, MySQL can satisfy the query by doing these steps:

random-access the index to the first record that matches the timestamp . O(log n).
grab the first country value from the index .
scan to the next country value in the index and count. O(n).
repeat step three until the end of the timestamp range.

This index scan operation is about as fast as a team of ace developers (the MySQL team) can get it to be with a decade of hard work. (You may not be able to outdo them on a Saturday afternoon.) MySQL satisfies the whole query with a small subset of the index, so it doesn't really matter how big the table behind it is.

If you run one of these queries right after the other, it's possible that MySQL will still have some or all the index data blocks in a RAM cache, so it might not have to re-fetch them from disk. That will help even more.

Do you see how your example queries lead with timestamp ? The most important WHERE criterion chooses a timestamp range. That's why the compound index I suggested has timestamp as its first column. If you don't have any queries that lead with country your simple index on that column probably is useless.

You asked whether you really need compound covering indexes. You probably should read about how they work and make that decision for yourself.

There's obviously a tradeoff in choosing indexes. Each index slows the process of INSERT and UPDATE a little, and can speed up queries a lot. Only you can sort out the tradeoffs for your particular application.

Answer 2

Since both queries have different GROUP BY clauses they are inherently different and cannot be combined. Assuming there already is an index present on the timestamp field there is no straightforward way to make this more efficient.

If the dataset is huge (10 million or more rows) you might get a little extra efficiency out of making an extra combined index on country, timestamp , but that's unlikely to be measurable, and the lack of it will usually be mitigated by in-memory buffering of MySQL itself if these 2 queries are executed directly after another.

Two different queries on the same table with the same WHERE clause

Question

2 answers

solution1
2 ACCPTED 2014-06-21 12:28:42

solution2
0 2014-06-21 10:29:05

Two different queries on the same table with the same WHERE clause

Question

2 answers

solution1 2 ACCPTED 2014-06-21 12:28:42

solution2 0 2014-06-21 10:29:05

solution1
2 ACCPTED 2014-06-21 12:28:42

solution2
0 2014-06-21 10:29:05