简体   繁体   English

使用大于运算符和ORDER BY的MySQL查询的索引帮助

[英]index help for a MySQL query using greater-than operator and ORDER BY

I have a table with at least a couple million rows and a schema of all integers that looks roughly like this: 我有一个至少有几百万行的表和一个看起来大致如下的所有整数的模式:

start
stop
first_user_id
second_user_id

The rows get pulled using the following queries: 使用以下查询拉取行:

SELECT * 
  FROM tbl_name 
 WHERE stop >= M 
   AND first_user_id=N  
   AND second_user_id=N 
ORDER BY start ASC

SELECT * 
  FROM tbl_name 
 WHERE stop >= M 
   AND first_user_id=N 
ORDER BY start ASC

I cannot figure out the best indexes to speed up these queries. 我无法找出加速这些查询的最佳索引。 The problem seems to be the ORDER BY because when I take that out the queries are fast. 问题似乎是ORDER BY,因为当我把它拿出来时查询速度很快。

I've tried all different types of indexes using the standard index format: 我使用标准索引格式尝试了所有不同类型的索引:

ALTER TABLE tbl_name ADD INDEX index_name (index_col_1,index_col_2,...)

And none of them seem to speed up the queries. 而且他们似乎都没有加快查询速度。 Does anyone have any idea what index would work? 有谁知道什么索引可以工作? Also, should I be trying a different type of index? 另外,我应该尝试不同类型的索引吗? I can't guarantee the uniqueness of each row so I've avoided UNIQUE indexes. 我不能保证每一行的唯一性,所以我避免使用UNIQUE索引。

Any guidance/help would be appreciated. 任何指导/帮助将不胜感激。 Thanks! 谢谢!

Update: here are a list of the indexes, I didn't include this originally since I've taken a shotgun approach and added a ton of indexes looking for one that works: 更新:这里是一个索引列表,我最初没有包括这个,因为我采取了猎枪的方法,并添加了大量的索引寻找一个有效的:

start_index: [start, first_user_id, second_user_id]
stop_index: [stop, first_user_id, second_user_id]
F1_index: [first_user_id]
F2_index: [second_user_id]
F3_index: [another_id]
test_1_index: [first_user_id,stop,start]
test_2_index: [first_user_id,start,stop]
test_3_index: [start,stop,first_user_id,second_user_id]
test_4_index: [stop,first_user_id,second_user_id,start]
test_5_index: [stop,start]

And here is the EXPLAIN output. 这是EXPLAIN输出。

*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: listing
type: index_merge
possible_keys: stop_index,F1_index,F3_index,test_1_index,test_2_index,test_4_index,test_5_index
key: F1_index,F3_index
key_len: 5,5
ref: NULL
rows: 238
Extra: Using intersect(F1_index,F3_index); Using where; Using filesort

Update for posterity 后人更新

We ended up completely re-evaluating how we were querying the table and chose these indexes: 我们最终完全重新评估了我们如何查询表并选择了这些索引:

index_select_1: [first_user_id,start,stop]
index_select_2: [first_user_id,second_user_id,start,stop]

and then we select on the table with queries like these: 然后我们在桌面上选择这样的查询:

SELECT * 
  FROM tbl_name 
 WHERE first_user_id=N
   AND start >= M 
ORDER BY start ASC

SELECT * 
  FROM tbl_name 
 WHERE first_user_id=N   
   AND second_user_id=N
   AND start >= M 
ORDER BY start ASC

Thanks to everyone that answered, you really helped me think through the problem. 感谢所有回答的人,你真的帮助我思考问题。

Could you make your sample table and EXPLAIN results match? 你能让你的样本表和EXPLAIN结果匹配吗? Because, obviously it is not a same situation and we don't know if you maybe made a mistake in abstracting your real query only by looking at the provided EXPLAIN results. 因为,显然它不是同一种情况,我们不知道你是否只是通过查看提供的EXPLAIN结果来抽象你的真实查询时犯了错误。 If you don't want to show too much of a structure then reverse it and create the quoted table structure and provide EXPLAIN result on that (maybe you will catch the problem that way). 如果你不想显示太多的结构然后反转它并创建引用的表结构并提供EXPLAIN结果(也许你会以这种方式捕获问题)。

Now one thing is certain - sorting is using filesort , which is bad. 现在有一件事是肯定的 - 排序是使用filesort ,这很糟糕。

To simplify (we'll come back to it) - compound indexes useful for sorting need to have the sort field in front. 为了简化(我们将回到它) - 用于排序的复合索引需要在前面放置排序字段。

Example idx(ID, Start) 示例idx(ID,开始)

ID      Start
1
        5
        8
        8
        10
        25
2
        3
        9
        10
        40
        41
        42
        42
...

In the above example the index is not of much help for sorting if you don't have where condition in which ID is limited to only one value. 在上面的示例中,如果您没有ID仅限于一个值的条件,则索引对排序没有多大帮助。

But , this exception is important since you have single row selectivity on one or both id fields. 但是 ,此异常非常重要,因为您在一个或两个id字段上都有单行选择性。

So from your indexes the only indexes that have start at the beginning are 因此,从索引开始,唯一的索引就是

start_index: [start, first_user_id, second_user_id]
test_3_index: [start,stop,first_user_id,second_user_id]

Mysql ignores the index Mysql忽略了索引

start_index: [start, first_user_id, second_user_id]

because it has better choices in terms of selectivity - it would need to do an index scan with this index and it has indexes that will allow it to do index intersect jumping directly to (unsorted) results. 因为它在选择性方面有更好的选择 - 它需要使用这个索引进行索引扫描,并且它具有索引,允许它直接将索引交叉跳转到(未排序)结果。 It expects better selectivity from the intersect and selectivity drives the planer. 它期望交叉选择性更好,选择性更强。

Once the result is obtained mysql should realize that it could use another index to sort the results, but it seems that it can not see how cheap that would be. 一旦获得结果,mysql应该意识到它可以使用另一个索引来对结果进行排序,但似乎它看不出它会有多便宜。

So to help the planer you could create an index that will capitalize on your single value selectivity with index such as: 因此,为了帮助刨床,您可以创建一个索引,该索引将利用索引来利用您的单值选择性,例如:

two_ids_with_sort: [first_user_id, second_user_id, start]

I assume that above would work very well on your second query where you have conditions on both id's giving you access to presorted start record pointers. 我假设上面的第二个查询可以很好地工作,你有两个id的条件,你可以访问预分类的开始记录指针。 The following query should do the same for the first query: 以下查询应对第一个查询执行相同的操作:

one_id_with_sort: [first_user_id, start]

Only if you end up with a lot of records in the result sets I would look into indexing it further. 只有当你在结果集中得到大量记录时,我才会考虑进一步索引它。

There are two paths there a) adding the field stop to the end of the index b) creating two more similar indexes with stop instead of start (index intersect could be used there and wider range of queries could benefit from it) 有两条路径a)将字段停止添加到索引的末尾b)使用stop而不是start创建两个更相似的索引(索引相交可以在那里使用,更广泛的查询可以从中受益)

But do test all of the above theories. 但要测试所有上述理论。

Couple of general suggestions 几个一般的建议

  • write your conditions in most selective manner first 首先以最有选择性的方式写下你的条件
  • when testing indexes start with single column indexes first and then expand to compound indexes (for example for sorting on start I would add index only on start) 当测试索引首先从单列索引开始然后扩展到复合索引时(例如,在启动时排序我只会在启动时添加索引)
  • too many indexes are not so good in mysql as the query planer is not able to quickly run through all the possible combinations and can not properly estimate costs of all the operations (so it cuts corners and the best index combination and plan might be left out) 在mysql中有太多的索引不是很好,因为查询刨床无法快速运行所有可能的组合,无法正确估计所有操作的成本(因此它会削减角落,最好的索引组合和计划可能会被遗漏)
  • therefore test indexes with USE INDEX (index1) FOR ORDER BY in your select to gauge a benefit for a certain index over planer, see more here (esp FORCE option; also - aim to leave only the useful indexes and see if planer will be able to utilize them then, if not, as a last resort only, force the indexes in your queries for which performance is crucial. keep in mind that this is a bad practice in terms of administration and design). 因此,在您的选择中USE INDEX (index1) FOR ORDER BY来测试指数,以评估特定指数对刨床的利益,请参阅此处 (特别是FORCE选项;还 - 目的是只留下有用的指数并查看刨床是否能够然后使用它们,如果不是,只作为最后的手段,强制查询中的索引哪些性能至关重要。请记住,这在管理和设计方面是不好的做法)。

Try to avoid using ranges (eg >, >=, <, <=) as the left most portion of a WHERE clause. 尽量避免使用范围(例如>,> =,<,<=)作为WHERE子句的最左边部分。 MySQL is unable to use an index for any fields in the WHERE clause to the right of a range. MySQL无法对范围右侧的WHERE子句中的任何字段使用索引。

At first glance I would say to at least create an index on first_user_id,stop,second_user_id. 乍一看,我会说至少在first_user_id,stop,second_user_id上创建一个索引。 Then specify the query accordingly: 然后相应地指定查询:

select * from tbl_name where first_user_id=N and stop >= M and second_user_id=N 从tbl_name中选择*,其中first_user_id = N并且停止> = M和second_user_id = N.

UPDATE: D'oh, so I completely contradicted myself in the above query - since incorporating second_user_id into the index is useless when specifying it in the WHERE after the stop "range", so let's try this again. 更新:D'哦,所以我在上面的查询中完全自相矛盾 - 因为在stop“range”之后在WHERE中指定它时,将second_user_id合并到索引中是没用的,所以让我们再试一次。

ALTER TABLE tbl_name ADD INDEX index_1 (first_user_id,stop) ALTER TABLE tbl_name ADD INDEX index_2 (first_user_id,second_user_id,stop) ALTER TABLE tbl_name ADD INDEX index_1(first_user_id,stop)ALTER TABLE tbl_name ADD INDEX index_2(first_user_id,second_user_id,stop)

The strange thing is that your query only returns 238 rows (?) 奇怪的是你的查询只返回238行(?)

Since you stated that the query is faster without the ORDER BY , may I suggest that you do the sort after the query ? 既然您没有使用ORDER BY表示查询速度更快,我可以建议您在查询后进行排序吗?
That may be quickest way to fix the problem. 这可能是解决问题的最快方法。

Also, don't forget to remove unused indexes afterwards :) 此外,不要忘记之后删除未使用的索引:)


edit 编辑

That's a wild guess (because I'm not sure mysql won't factorize the query to its current form), but you could try to do the following: 这是一个疯狂的猜测(因为我不确定mysql不会将查询分解为当前形式),但您可以尝试执行以下操作:

SELECT * FROM (
    SELECT * 
      FROM tbl_name 
     WHERE stop >= M 
       AND first_user_id=N 
    ) AS derived
ORDER BY start ASC

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM