简体   繁体   English

通过查询订购要花费太多时间

[英]Order by query is taking too much time

I have 80k plus customers and having 4 groups. 我有80k多客户,有4个群组。 Now I want to find 2 groups users with a query in mysql. 现在,我想在mysql中通过查询找到2个组用户。 My query is like below: 我的查询如下:

select c.customers_firstname as recipient_firstname, 
       c.customers_lastname as  recipient_lastname,
       c.customers_id as recipient_id, 
       c.customers_email_address as recipient_email_address 
from customers c
where customers_group_id = '1' OR customers_group_id = '3'

When I run this query in phpmyadmin I got the result : Showing rows 0 - 29 ( 59,815 total, Query took 0.0034 sec) 当我在phpmyadmin中运行此查询时,得到的结果是: 显示第0-29行(总共59,815行,查询耗时0.0034秒)

But when I added order by ORDER BY recipient_firstname ASC in this query the result time is : Showing rows 0 - 29 ( 59,815 total, Query took 0.2607 sec) 但是,当我在此查询中按ORDER BY recipient_firstname ASC receiver_firstname ORDER BY recipient_firstname ASC添加订单时,结果时间为: 显示第0-29行(总计59,815行,查询耗时0.2607秒)

the order by query is taking too much time for the result. 查询订单花费太多时间来获得结果。

I want to reduce the time of order by query. 我想减少查询的订购时间。

Please help if there is another way to get the same result in less time. 如果有其他方法可以在更短的时间内获得相同的结果,请提供帮助。

You need an index on the recipient_firstname field (so really customers.customers_firstname). 您需要在recipient_firstname字段上建立索引(因此,确实是customer.customers_firstname)。 An index allows for an ordered, linear time iteration over the result set. 索引允许对结果集进行有序的线性时间迭代。

If you don't have an index, the result set must be aggregated and then sorted. 如果没有索引,则必须先对结果集进行汇总,然后再进行排序。 This sorting is going to be n log n . 该排序将为n log n That's obviously pretty slow for large sets, and if it can't fit into memory (and 60k records might not depending on configuration), it's going to do a very slow file based sort. 对于大型集来说,这显然很慢,而且如果它不能容纳到内存中(并且60k记录可能不取决于配置),它将进行非常慢的基于文件的排序。

tl;dr You need an index. tl; dr您需要一个索引。 An index on recipient_firstname will make the query extremely close in performance to the non ORDER BY version. recipient_firstname上的索引将使查询的性能与非ORDER BY版本极为接近。


By the way, if customers_group_id is an integral field, use integer literals, not strings. 顺便说一句,如果customers_group_id是整数字段,请使用整数文字而不是字符串。 It likely won't make a difference, but it's misleading, and there are actually a few situations where it matters. 它可能不会有所作为,但会产生误导,实际上在某些情况下它很重要。


Depending on the situation, it's probably also worth putting an index on the group id. 根据情况,可能还值得在组ID上添加索引。 For small sets, the results can just be filtered as the set is built, but for large result sets, that will end up requiring a rather disk heavy full table scan. 对于小型集,结果可以在构建集时进行过滤,但是对于大型结果集,最终将需要大量磁盘全表扫描。

You have to index on the customers_firstname field: this will speed up the ORDER BY , but will also slow down the WHERE (which is probably indexed now). 您必须在customers_firstname字段上建立索引:这将加快ORDER BY速度, 但同时也会减慢WHERE速度 (现在可能已建立索引)。

So the index must be customers_group_id, customers_firstname in this order. 因此,该索引必须按此顺序为customers_group_id, customers_firstname

CREATE INDEX my_query_ndx 
    ON customers ( customers_group_id, customers_firstname );

In theory you might enlarge the index to be a covering index and contain, after the two key fields, all other fields you require in the SELECT . 理论上,您可以将索引扩大为覆盖索引,并在两个关键字段之后包含SELECT需要的所有其他字段。 Maintaining this kind of index is expensive, though; 但是,维护此类索引非常昂贵。 you'll have to balance advantages and drawbacks. 您必须权衡利弊。 If the table is very "wide", it might be advantageous to index on group id, firstname, lastname, id and email. 如果表非常“宽”,则在组ID,名字,姓氏,ID和电子邮件上建立索引可能是有利的。

Small (or not so small) query improvements 小(或不太小)查询改进

where customers_group_id = '1' OR customers_group_id = '3'

This can be rewritten for clarity (it changes nothing) as 为了清楚起见,可以将其重写为(不变)

WHERE customers_group_id IN ('1','3')

But now, either customer_group_id is an integer field, or it isn't. 但是现在, customer_group_id是一个整数字段,或者不是。 If it is, then it's better to treat is as such: 如果是这样,那么最好这样对待:

WHERE customers_group_id IN (1, 3)

In some cases, you can plan ahead your IDs so that for example group 3 is actually group 2, ie, the groups you might be interested in are contiguous. 在某些情况下,您可以预先计划ID,例如,第3组实际上是第2组,即您可能感兴趣的组是连续的。 That way, you can rewrite the query as variable < value or variable > value or variable BETWEEN , which is twice as fast as an OR . 这样,您可以将查询重写为variable < valuevariable > valuevariable BETWEEN ,其速度是OR两倍。 With large OR sets you can get 4x speedups easily. 使用大型OR集,您可以轻松获得4倍的加速比。

If it is not an integer field, then by all means strive to make it one. 如果它不是整数字段,则一定要使其成为一个整数字段。 Integer performance (and index size) will benefit greatly (note, however, that with strings, '3' is greater than '12', just as 'C' is greater than 'AB'; so, type conversion is not necessarily without side effects). 整数性能(和索引大小)将大大受益(但是,请注意,对于字符串,“ 3”大于“ 12”,就像“ C”大于“ AB”一样;因此,类型转换不一定没有边)效果)。

尝试创建索引(customers_group_id, customers_firstname) -这应该可以工作。

您需要在应用了order by子句的列上创建索引

CREATE INDEX index_name ON customers (customers_firstname);

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM