[英]Two different queries on the same table with the same WHERE clause
I have two different queries. 我有两个不同的查询。 But they are both on the same table and have both the same
WHERE
clause. 但是它们都在同一个表上,并且都具有相同的
WHERE
子句。 So they are selecting the same row. 因此,他们正在选择同一行。
Query 1: 查询1:
SELECT HOUR(timestamp), COUNT(*) as hits
FROM hits_table
WHERE timestamp >= CURDATE()
GROUP BY HOUR(timestamp)
Query 2: 查询2:
SELECT country, COUNT(*) as hits
FROM hits_table
WHERE timestamp >= CURDATE()
GROUP BY country
How can I make this more efficient? 如何提高效率?
If this table is indexed correctly, it honestly doesn't matter how big the entire table is because you're only looking at today's rows. 如果该表的索引正确,那么整个表有多大大小都没关系,因为您只查看今天的行。
If the table is indexed incorrectly the performance of these queries will be terrible no matter what you do. 如果表索引不正确,无论您做什么,这些查询的性能都会很糟糕。
Your WHERE timestamp >= CURDATE()
clause means you need to have an index on the timestamp
column. 您的
WHERE timestamp >= CURDATE()
子句意味着您需要在timestamp
列上有一个索引。 In one of your queries the GROUP BY country
shows that a compound covering index on (timestamp, country)
will be a great help. 在您的一个查询中,
GROUP BY country
显示(timestamp, country)
的复合覆盖索引将有很大帮助。
So, a single compound index (timestamp, country)
will satisfy both the queries in your question. 因此,单个复合索引
(timestamp, country)
将满足您问题中的两个查询。
Let's explain how that works. 让我们解释一下它是如何工作的。 To look for today's records (or indeed any records starting and ending with particular
timestamp
values) and group them by country, and count them, MySQL can satisfy the query by doing these steps: 为了查找今天的记录(或者实际上是任何以特定
timestamp
值开始和结束的记录)并按国家/地区进行分组并计数,MySQL可以通过执行以下步骤来满足查询:
timestamp
. timestamp
匹配的第一条记录的索引。 O(log n). country
value from the index . country
价值。 country
value in the index and count. country
值并计数。 O(n). timestamp
range. timestamp
范围结束。 This index scan operation is about as fast as a team of ace developers (the MySQL team) can get it to be with a decade of hard work. 这项索引扫描操作的速度与一组ace开发人员(MySQL团队)以十年的辛勤工作所能达到的速度一样快。 (You may not be able to outdo them on a Saturday afternoon.) MySQL satisfies the whole query with a small subset of the index, so it doesn't really matter how big the table behind it is.
(您可能无法在星期六的下午超越它们。)MySQL只需一小部分索引就能满足整个查询,因此它背后的表有多大并不重要。
If you run one of these queries right after the other, it's possible that MySQL will still have some or all the index data blocks in a RAM cache, so it might not have to re-fetch them from disk. 如果您在另一个查询之后立即运行其中一个查询,则MySQL仍有可能在RAM缓存中保留部分或全部索引数据块,因此它可能不必从磁盘重新获取它们。 That will help even more.
这将提供更多帮助。
Do you see how your example queries lead with timestamp
? 您看到示例查询如何以
timestamp
吗? The most important WHERE
criterion chooses a timestamp range. 最重要的
WHERE
准则选择时间戳范围。 That's why the compound index I suggested has timestamp
as its first column. 这就是为什么我建议的复合索引将
timestamp
作为其第一列的原因。 If you don't have any queries that lead with country
your simple index on that column probably is useless. 如果没有任何以
country
开头的查询,那么该列上的简单索引可能就没有用了。
You asked whether you really need compound covering indexes. 您询问是否真的需要复合覆盖指数。 You probably should read about how they work and make that decision for yourself.
您可能应该阅读 有关它们如何工作的信息,并自己做出决定。
There's obviously a tradeoff in choosing indexes. 选择索引时显然需要权衡。 Each index slows the process of
INSERT
and UPDATE
a little, and can speed up queries a lot. 每个索引都会稍微减慢
INSERT
和UPDATE
的过程,并且可以大大加快查询的速度。 Only you can sort out the tradeoffs for your particular application. 只有您才能解决您的特定应用程序的权衡问题。
Since both queries have different GROUP BY
clauses they are inherently different and cannot be combined. 由于两个查询具有不同的
GROUP BY
子句,因此它们本质上是不同的,因此无法合并。 Assuming there already is an index present on the timestamp
field there is no straightforward way to make this more efficient. 假设
timestamp
字段上已经存在索引,则没有直接的方法可以使此效率更高。
If the dataset is huge (10 million or more rows) you might get a little extra efficiency out of making an extra combined index on country, timestamp
, but that's unlikely to be measurable, and the lack of it will usually be mitigated by in-memory buffering of MySQL itself if these 2 queries are executed directly after another. 如果数据集很大(1000万或更多行),则可以通过对
country, timestamp
创建额外的组合索引而获得一些额外的效率,但这不太可能被衡量,并且通常可以通过以下方法来缓解缺少的情况:如果这两个查询是在另一个查询之后直接执行的,则为MySQL本身提供内存缓冲。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.