具有相同WHERE子句的同一表上的两个不同查询

Question

I have two different queries. 我有两个不同的查询。 But they are both on the same table and have both the same WHERE clause. 但是它们都在同一个表上，并且都具有相同的WHERE子句。 So they are selecting the same row. 因此，他们正在选择同一行。

Query 1: 查询1：

SELECT HOUR(timestamp), COUNT(*) as hits 
FROM hits_table 
WHERE timestamp >= CURDATE() 
GROUP BY HOUR(timestamp)

Query 2: 查询2：

SELECT country, COUNT(*) as hits 
FROM hits_table 
WHERE timestamp >= CURDATE()
GROUP BY country

How can I make this more efficient? 如何提高效率？

Answer 1

If this table is indexed correctly, it honestly doesn't matter how big the entire table is because you're only looking at today's rows. 如果该表的索引正确，那么整个表有多大大小都没关系，因为您只查看今天的行。

If the table is indexed incorrectly the performance of these queries will be terrible no matter what you do. 如果表索引不正确，无论您做什么，这些查询的性能都会很糟糕。

Your WHERE timestamp >= CURDATE() clause means you need to have an index on the timestamp column. 您的WHERE timestamp >= CURDATE()子句意味着您需要在timestamp列上有一个索引。 In one of your queries the GROUP BY country shows that a compound covering index on (timestamp, country) will be a great help. 在您的一个查询中， GROUP BY country显示(timestamp, country)的复合覆盖索引将有很大帮助。

So, a single compound index (timestamp, country) will satisfy both the queries in your question. 因此，单个复合索引(timestamp, country)将满足您问题中的两个查询。

Let's explain how that works. 让我们解释一下它是如何工作的。 To look for today's records (or indeed any records starting and ending with particular timestamp values) and group them by country, and count them, MySQL can satisfy the query by doing these steps: 为了查找今天的记录（或者实际上是任何以特定timestamp值开始和结束的记录）并按国家/地区进行分组并计数，MySQL可以通过执行以下步骤来满足查询：

random-access the index to the first record that matches the timestamp . 随机访问与timestamp匹配的第一条记录的索引。 O(log n). O（log n）。
grab the first country value from the index . 从指数中获取第一country价值。
scan to the next country value in the index and count. 扫描到索引中的下一个country值并计数。 O(n). 上）。
repeat step three until the end of the timestamp range. 重复第三步，直到timestamp范围结束。

This index scan operation is about as fast as a team of ace developers (the MySQL team) can get it to be with a decade of hard work. 这项索引扫描操作的速度与一组ace开发人员（MySQL团队）以十年的辛勤工作所能达到的速度一样快。 (You may not be able to outdo them on a Saturday afternoon.) MySQL satisfies the whole query with a small subset of the index, so it doesn't really matter how big the table behind it is. （您可能无法在星期六的下午超越它们。）MySQL只需一小部分索引就能满足整个查询，因此它背后的表有多大并不重要。

If you run one of these queries right after the other, it's possible that MySQL will still have some or all the index data blocks in a RAM cache, so it might not have to re-fetch them from disk. 如果您在另一个查询之后立即运行其中一个查询，则MySQL仍有可能在RAM缓存中保留部分或全部索引数据块，因此它可能不必从磁盘重新获取它们。 That will help even more. 这将提供更多帮助。

Do you see how your example queries lead with timestamp ? 您看到示例查询如何以timestamp吗？ The most important WHERE criterion chooses a timestamp range. 最重要的WHERE准则选择时间戳范围。 That's why the compound index I suggested has timestamp as its first column. 这就是为什么我建议的复合索引将timestamp作为其第一列的原因。 If you don't have any queries that lead with country your simple index on that column probably is useless. 如果没有任何以country开头的查询，那么该列上的简单索引可能就没有用了。

You asked whether you really need compound covering indexes. 您询问是否真的需要复合覆盖指数。 You probably should read about how they work and make that decision for yourself. 您可能应该阅读有关它们如何工作的信息，并自己做出决定。

There's obviously a tradeoff in choosing indexes. 选择索引时显然需要权衡。 Each index slows the process of INSERT and UPDATE a little, and can speed up queries a lot. 每个索引都会稍微减慢INSERT和UPDATE的过程，并且可以大大加快查询的速度。 Only you can sort out the tradeoffs for your particular application. 只有您才能解决您的特定应用程序的权衡问题。

Answer 2

Since both queries have different GROUP BY clauses they are inherently different and cannot be combined. 由于两个查询具有不同的GROUP BY子句，因此它们本质上是不同的，因此无法合并。 Assuming there already is an index present on the timestamp field there is no straightforward way to make this more efficient. 假设timestamp字段上已经存在索引，则没有直接的方法可以使此效率更高。

If the dataset is huge (10 million or more rows) you might get a little extra efficiency out of making an extra combined index on country, timestamp , but that's unlikely to be measurable, and the lack of it will usually be mitigated by in-memory buffering of MySQL itself if these 2 queries are executed directly after another. 如果数据集很大（1000万或更多行），则可以通过对country, timestamp创建额外的组合索引而获得一些额外的效率，但这不太可能被衡量，并且通常可以通过以下方法来缓解缺少的情况：如果这两个查询是在另一个查询之后直接执行的，则为MySQL本身提供内存缓冲。

具有相同WHERE子句的同一表上的两个不同查询

问题描述

2 个解决方案

解决方案1
2 已采纳 2014-06-21 12:28:42

解决方案2
0 2014-06-21 10:29:05

具有相同WHERE子句的同一表上的两个不同查询

问题描述

2 个解决方案

解决方案1 2 已采纳 2014-06-21 12:28:42

解决方案2 0 2014-06-21 10:29:05

解决方案1
2 已采纳 2014-06-21 12:28:42

解决方案2
0 2014-06-21 10:29:05