简体   繁体   English

MySQL查询性能非常慢

[英]Very slow MySQL query performance

I've a query that takes about 18 seconds to finish: 我的查询大约需要18秒才能完成:

THE QUERY: 查询:

SELECT YEAR(c.date), MONTH(c.date), p.district_id, COUNT(p.owner_id)
FROM commission c
  INNER JOIN partner p ON c.customer_id = p.id
WHERE (c.date BETWEEN '2018-01-01' AND '2018-12-31')
  AND (c.company_id = 90)
  AND (c.source = 'ACTUAL')
  AND (p.id IN (3062, 3063, 3064, 3065, 3066, 3067, 3068, 3069, 3070, 3071,
    3072, 3073, 3074, 3075, 3076, 3077, 3078, 3079, 3081, 3082, 3083, 3084,
    3085, 3086, 3087, 3088, 3089, 3090, 3091, 3092, 3093, 3094, 3095, 3096,
    3097, 3098, 3099, 3448, 3449, 3450, 3451, 3452, 3453, 3454, 3455, 3456,
    3457, 3458, 3459, 3460, 3461, 3471, 3490, 3491, 6307, 6368, 6421))
  GROUP BY YEAR(c.date), MONTH(c.date), p.district_id

The commission table has around 2,8 millions of records, of which 860 000+ belong to the current year 2018. The partner table has at this moment 8600+ records. commission表有大约2,800万条记录,其中860 000+属于2018年。 partner表目前有8600条记录。

RESULT 结果

| `YEAR(c.date)` | `MONTH(c.date)` | district_id | `COUNT(c.id)` | 
|----------------|-----------------|-------------|---------------| 
| 2018           | 1               | 1           | 19154         | 
| 2018           | 1               | 5           | 9184          | 
| 2018           | 1               | 6           | 2706          | 
| 2018           | 1               | 12          | 36296         | 
| 2018           | 1               | 15          | 13085         | 
| 2018           | 2               | 1           | 21231         | 
| 2018           | 2               | 5           | 10242         | 
| ...            | ...             | ...         | ...           | 

55 rows retrieved starting from 1 in 18 s 374 ms 
(execution: 18 s 368 ms, fetching: 6 ms)

EXPLAIN: 说明:

| id | select_type | table | partitions | type  | possible_keys                                                                                        | key                  | key_len | ref             | rows | filtered | extra                                        | 
|----|-------------|-------|------------|-------|------------------------------------------------------------------------------------------------------|----------------------|---------|-----------------|------|----------|----------------------------------------------| 
| 1  | SIMPLE      | p     | null       | range | PRIMARY                                                                                              | PRIMARY              | 4       |                 | 57   | 100      | Using where; Using temporary; Using filesort | 
| 1  | SIMPLE      | c     | null       | ref   | UNIQ_6F7146F0979B1AD62FC0CB0F5F8A7F73,IDX_6F7146F09395C3F3,IDX_6F7146F0979B1AD6,IDX_6F7146F0AA9E377A | IDX_6F7146F09395C3F3 | 5       | p.id            | 6716 | 8.33     | Using where                                  | 

DDL: DDL:

create table if not exists commission (
    id int auto_increment
        primary key,
    date date not null,
    source enum('ACTUAL', 'EXPECTED') not null,
    customer_id int null,
    transaction_id varchar(255) not null,
    company_id int null,
    constraint UNIQ_6F7146F0979B1AD62FC0CB0F5F8A7F73 unique (company_id, transaction_id, source),
    constraint FK_6F7146F09395C3F3 foreign key (customer_id) references partner (id),
    constraint FK_6F7146F0979B1AD6 foreign key (company_id) references companies (id)
) collate=utf8_unicode_ci;
create index IDX_6F7146F09395C3F3 on commission (customer_id);
create index IDX_6F7146F0979B1AD6 on commission (company_id);
create index IDX_6F7146F0AA9E377A on commission (date);

I noted that by removing the partner IN condition MySQL takes only 3s. 我注意到,通过删除伙伴IN条件,MySQL只需3秒。 I tried to replace it doing something crazy like this: 我尝试用这样的东西替换它:

AND (',3062,3063,3064,3065,3066,3067,3068,3069,3070,3071,3072,3073,3074,3075,3076,3077,3078,3079,3081,3082,3083,3084,3085,3086,3087,3088,3089,3090,3091,3092,3093,3094,3095,3096,3097,3098,3099,3448,3449,3450,3451,3452,3453,3454,3455,3456,3457,3458,3459,3460,3461,3471,3490,3491,6307,6368,6421,'
     LIKE CONCAT('%,', p.id, ',%')) 

and the result was about 5s... great! 结果是大约5s ......太棒了! but it's a hack. 但这是一个黑客。

WHY this query is taking a very long execution time when I uses IN statement? 为什么当我使用IN语句时,这个查询需要很长的执行时间? workaround, tips, links, etc. Thanks! 解决方法,提示,链接等。谢谢!

MySQL can use one index at a time. MySQL一次可以使用一个索引。 For this query you need a compound index covering the aspects of the search. 对于此查询,您需要一个涵盖搜索方面的复合索引。 Constant aspects of the WHERE clause should be used before range aspects like: 在范围方面之前应该使用WHERE子句的常量方面,例如:

ALTER TABLE commission
DROP INDEX IDX_6F7146F0979B1AD6,
ADD INDEX IDX_6F7146F0979B1AD6 (company_id, source, date)

Here's what the Optimizer sees in your query. 以下是优化程序在查询中看到的内容。

Checking whether to use an index for the GROUP BY : 检查是否使用GROUP BY的索引:

  • Functions ( YEAR() ) in the GROUP BY , so no. 函数( YEAR() )在GROUP BY ,所以没有。
  • Multiple tables ( c and p ) mentioned, so no. 提到了多个表( cp ),所以没有。

For a JOIN , Optimizer will (almost always) start with one, then reach into the other. 对于JOIN ,优化器将(几乎总是)从一个开始,然后到达另一个。 So, let's look at the two options: 那么,让我们来看看这两个选项:

If starting with p : 如果以p 开头

Assuming you have PRIMARY KEY(id) , there is not much to think about. 假设你有PRIMARY KEY(id) ,没什么可考虑的。 It will simply use that index. 它只会使用该索引。

For each row selected from p , it will then look into c , and any variation of this INDEX would be optimal. 对于从p选择的每一行,它将查看c ,并且此INDEX任何变化都是最佳的。

c: INDEX(company_id, source, customer_id,  -- in any order (all are tested "=")
         date)       -- last, since it is tested as a range

If starting with c : 如果以c 开头

c: INDEX(company_id, source,  -- in any order (all are tested "=")
         date)       -- last, since it is tested as a range
-- slightly better:
c: INDEX(company_id, source,  -- in any order (all are tested "=")
         date,       -- last, since it is tested as a range
         customer_id)  -- really last -- added only to make it "covering".

The Optimizer will look at "statistics" to crudely decide which table to start with. 优化器将查看“统计信息”以粗略地决定从哪个表开始。 So, add all the indexes I suggested. 所以,添加我建议的所有索引。

A "covering" index is one that contains all the columns needed anywhere in the query. “覆盖”索引是包含查询中任何位置所需的所有列的索引。 It is sometimes wise to extend a 'good' index with more columns to make it "covering". 有时候 ,使用更多列扩展“好”索引以使其“覆盖”是明智的。

But there is a monkey wrench in here. 但是这里有一把猴子扳手。 c.customer_id = p.id means that customer_id IN (...) effectively exists. c.customer_id = p.id表示customer_id IN (...)有效存在。 But now there are two "range-like" constraints -- one is an IN , the other is a 'range'. 但现在有两个“范围”约束 - 一个是IN ,另一个是'范围'。 In some newer versions, the Optimizer will happily jump around due to the IN and still be able to do "range" scans. 在某些较新的版本中,优化器会因IN而愉快地跳转,并且仍能进行“范围”扫描。 So, I recommend this ordering: 所以,我推荐这个订购:

  1. Test(s) of column = constant column = constant测试column = constant
  2. Test(s) with IN IN测试
  3. One 'range' test ( BETWEEN , >= , LIKE with trailing wildcard, etc) 一个 '范围'测试( BETWEEN>=LIKE尾随通配符等)
  4. Perhaps add more columns to make it "covering" -- but don't do this step if you end up with more than, say, 5 columns in the index. 也许添加更多列以使其“覆盖” - 但如果您最终在索引中输入超过5列,则不要执行此步骤。

Hence, for c , the following is optimal for the WHERE , and happens to be "covering". 因此,对于c ,以下对于WHERE是最佳的,并且碰巧是“覆盖”。

INDEX(company_id, source,  -- first, but in any order (all "=")
      customer_id,  -- "IN"
      date)       -- last, since it is tested as a range

p: (same as above)

Since there was an IN or "range", there is no use seeing if the index can also handle the GROUP BY . 由于存在IN或“范围”,因此查看索引是否也可以处理GROUP BY是没有用的。

A note on COUNT(x) -- it checks that x is NOT NULL . 关于COUNT(x)注释 - 它检查x是否为NOT NULL It is usually just as correct to say COUNT(*) , which counts the number of rows without any extra checking. COUNT(*) 通常也是正确的,它计算行数而不进行任何额外检查。

This is a non-starter since it hides the indexed column ( id ) in a function: 这是一个非启动器,因为它隐藏了函数中的索引列( id ):

AND (',3062,3063,3064,3065,3066,...6368,6421,'
     LIKE CONCAT('%,', p.id, ',%'))

With your LIKE-hack you are tricking optimizer so it uses different plan (most probably using IDX_6F7146F0AA9E377A index on the first place). 使用LIKE-hack,你会欺骗优化器,因此它使用不同的计划(最有可能首先使用IDX_6F7146F0AA9E377A索引)。 You should be able to see this in explain. 您应该能够在解释中看到这一点。

I think the real issue in your case is the second line of explain: server executing multiple functions (MONTH, YEAR) for 6716 rows and then trying to group all these rows. 我认为你的案例中的真正问题是第二行解释:服务器为6716行执行多个函数(MONTH,YEAR),然后尝试对所有这些行进行分组。 During this time all these 6716 rows should be stored (in memory or on disk that is based on your server configuration). 在此期间,应存储所有这6716行(在内存中或基于服务器配置的磁盘上)。

SELECT COUNT(*) FROM commission WHERE (date BETWEEN '2018-01-01' AND '2018-12-31') AND company_id = 90 AND source = 'ACTUAL';

=> How many rows are we talking about? =>我们谈论了多少行?

If the number in above query is much lower then 6716 I'd try to add covering index on columns customer_id, company_id, source and date. 如果上面查询中的数字要低得多6716我会尝试在列customer_id,company_id,source和date上添加覆盖索引。 Not sure about the best order as it depends on data you have (check cardinality for these columns). 不确定最佳订单,因为它取决于您拥有的数据(检查这些列的基数)。 I'd started with index (date, company_id, source, customer_id). 我从index(date,company_id,source,customer_id)开始。 Also, I'd add unique index (id, district_id, owner_id) on partner. 另外,我会在合作伙伴上添加唯一索引(id,district_id,owner_id)。

It is also possible to add additional generated stored columns _year and _month (if your server is a bit old you can add normal columns and fill them in with trigger) to rid off the multiple function executions. 还可以添加其他生成的存储列_year和_month(如果您的服务器有点旧,您可以添加普通列并使用触发器填充它们)以消除多个函数执行。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM