简体   繁体   English

MySQL 忽略 MySQL 5.7 中 INNODB 表上的计数查询索引

[英]MySQL ignoring index for count query on INNODB table in MySQL 5.7

we've got one old database running MySQL 5.1.我们有一个运行 MySQL 5.1 的旧数据库。 We now want to migrate it to MySQL 5.7 but some queries that worked okay are suddenly very very slow (by a factor of 60 or more slower).我们现在想将它迁移到 MySQL 5.7,但是一些运行良好的查询突然变得非常慢(慢了 60 倍或更多)。

The INNODB table in question (EVENT) has amongst other columns a COMPANY_ID (foreign key to a COMPANY table) and EVENT_DATETIME of type DATETIME.有问题的 INNODB 表 (EVENT) 在其他列中具有 COMPANY_ID(COMPANY 表的外键)和 DATETIME 类型的 EVENT_DATETIME。 There is an index on COMPANY_ID, EVENT_DATETIME and for testing I've added one EVENT_DATETIME, COMPANY_ID. COMPANY_ID,EVENT_DATETIME 上有一个索引,为了测试,我添加了一个 EVENT_DATETIME,COMPANY_ID。 Currently basically all EVENTs have the COMPANY_ID 1 but this will change.目前基本上所有的事件都有 COMPANY_ID 1 但这会改变。

We have a count query to query the number of events in the last year:我们有一个计数查询来查询去年的事件数:

select count(distinct this_.EVENT_ID) as y0_ from EVENT this_
       where this_.EVENT_DATETIME>='2018-10-22 00:00:00'
         and this_.EVENT_DATETIME<='2019-11-21 00:00:00'
         and this_.COMPANY_ID = 1;

The result is around 1,000,000 rows and used to take about 1.5 seconds now it takes up to 100 seconds.结果是大约 1,000,000 行,过去大约需要 1.5 秒,现在需要 100 秒。 While the query on MySQL 5.1 uses an index on COMPANY_ID and EVENT_DATETIME the index is ignored on MySQL 5.7.虽然 MySQL 5.1 上的查询使用 COMPANY_ID 和 EVENT_DATETIME 上的索引,但 MySQL 5.7 上的索引被忽略。 It seems if MySQL sees that it has to parse too many rows it gives up on an index even if it would help.如果 MySQL 认为它必须解析太多行,它会放弃索引,即使它会有所帮助。 If I reduce the window to eg 10 month MySQL 5.7 uses the index again.如果我将 window 减少到例如 10 个月 MySQL 5.7 再次使用索引。

So on MySQL 5.1 the index COMPANY_ID,EVENT_DATETIME is used on MySQL it only uses a foreign key index for COMPANY_ID.因此,在 MySQL 5.1 上,索引 COMPANY_ID,EVENT_DATETIME 用于 MySQL,它仅使用 COMPANY_ID 的外键索引。

If I run the query without the where on the COMPANY_ID如果我在 COMPANY_ID 上没有 where 的情况下运行查询

select count(distinct this_.EVENT_ID) as y0_ from EVENT this_ 
       where this_.EVENT_DATETIME>='2018-10-22 00:00:00'
         and this_.EVENT_DATETIME<='2019-11-21 00:00:00';

the query is a lot faster.查询要快得多。

Is there any way to force MySQL 5.7 to use a certain index?有没有办法强制 MySQL 5.7 使用某个索引?

If I rewrite the query to this:如果我将查询重写为:

select count(distinct this_.EVENT_ID) as y0_ from EVENT this_
     where this_.EVENT_DATETIME>='2018-10-22 00:00:00'
       and this_.EVENT_DATETIME<='2019-11-21 00:00:00'
     GROUP BY COMPANY_ID HAVING COMPANY_ID = 1;

it is back to about 1 to 1.5 seconds.它回到大约 1 到 1.5 秒。 The problem is we might have more than one of these queries and the queries are generated by Hibernate Criterias which do not support HAVING so my workaround won't work in real life.问题是我们可能有多个这些查询,并且查询是由不支持 HAVING 的 Hibernate 标准生成的,因此我的解决方法在现实生活中不起作用。

Update: MySQL 5.7 Explain for 12 month query (1050757 rows in 40 seconds)更新:MySQL 5.7 解释 12 个月查询(40 秒内 1050757 行)

{
  "query_block": {
    "select_id": 1,
    "cost_info": {
      "query_cost": "673838.60"
    },
    "table": {
      "table_name": "this_",
      "access_type": "ref",
      "possible_keys": [
      "PRIMARY",
      "FK_EVENT_COMPANY",
      "IX_REFERENCE",
      "IX_DATE_TIME",
      "EVENT_DATETIME",
      "IDX_CE_COMPANY_TYPE",
      "IDX_CE_COMPANY_DATE",
      "IDX_CE_DATE_COMPANY"
      ],
      "key": "FK_EVENT_COMPANY",
      "used_key_parts": [
        "COMPANY_ID"
      ],
      "key_length": "4",
      "ref": [
        "const"
      ],
      "rows_examined_per_scan": 2698153,
      "rows_produced_per_join": 1135826,
      "filtered": "42.10",
      "cost_info": {
        "read_cost": "134208.00",
        "eval_cost": "227165.40",
        "prefix_cost": "673838.60",
        "data_read_per_join": "1G"
      },
      "used_columns": [
        "EVENT_ID",
        "COMPANY_ID",
        "EVENT_DATETIME"
      ],
      "attached_condition": "((`test`.`this_`.`EVENT_DATETIME` >= '2018-10-22 00:00:00') and (`test`.`this_`.`EVENT_DATETIME` <= '2019-11-21 00:00:00'))"
    }
  }
}

Explain for 10 month query解释 10 个月查询

   {
  "query_block": {
    "select_id": 1,
    "cost_info": {
      "query_cost": "634047.16"
    },
    "table": {
      "table_name": "this_",
      "access_type": "range",
      "possible_keys": [
        "PRIMARY",
        "FK_EVENT_COMPANY",
        "IX_REFERENCE",
        "IX_DATE_TIME",
        "EVENT_DATETIME",
        "IDX_CE_COMPANY_TYPE",
        "IDX_CE_COMPANY_DATE",
        "IDX_CE_DATE_COMPANY"
      ],
      "key": "IDX_CE_DATE_COMPANY",
      "used_key_parts": [
        "EVENT_DATETIME"
      ],
      "key_length": "9",
      "rows_examined_per_scan": 1578860,
      "rows_produced_per_join": 789430,
      "filtered": "50.00",
      "using_index": true,
      "cost_info": {
        "read_cost": "476161.16",
        "eval_cost": "157886.00",
        "prefix_cost": "634047.16",
        "data_read_per_join": "1G"
      },
      "used_columns": [
        "EVENT_ID",
        "COMPANY_ID",
        "EVENT_DATETIME"
      ],
      "attached_condition": "((`test`.`this_`.`COMPANY_ID` = 1) and (`test`.`this_`.`EVENT_DATETIME` >= '2019-01-22 00:00:00') and (`test`.`this_`.`EVENT_DATETIME` <= '2019-11-21 00:00:00'))"
    }
  }
}

Interesting is that the first 12 month (slow) query does not show COMPANY_ID in the attached_condition while for the second 10 month query attached_condition has a check on COMPANY_ID.有趣的是,前 12 个月(慢)查询未在 attach_condition 中显示 COMPANY_ID,而第二个 10 个月查询 attach_condition 对 COMPANY_ID 进行了检查。

ANALYZE TABLE as was suggested did not change anything it seems.所建议的 ANALYZE TABLE 似乎并没有改变任何东西。

Update 2: Explain for MySQL 5.1 (does not support JSON format) takes 1.3 sec更新 2:解释 MySQL 5.1(不支持 JSON 格式)需要 1.3 秒

1    SIMPLE         this_  range   FK_EVENT_COMPANY,IX_DATE_TIME,EVENT_DATETIME,IDX_CE_COMPANY_TYPE,IDX_CE_COMPANY_DATE    IDX_CE_COMPANY_DATE 16      NULL    2018704   Using where; Using index

The query planner may be taking wrong decisions based on the available statistics.查询规划器可能会根据可用的统计信息做出错误的决定。 You can try to run ANALYZE ( https://dev.mysql.com/doc/refman/5.6/en/analyze-table.html ) to rebuild the stats and provide better numbers to the planner.您可以尝试运行ANALYZEhttps://dev.mysql.com/doc/refman/5.6/en/analyze-table.ZFC35FDC70D5FC69D2698883A822C7A53 )并提供更好的数字给规划者。 Just notice that ANALYZE will block the table while it runs (it is fast).请注意,ANALYZE 会在运行时阻塞表(它很快)。

UPDATE更新

Reading the MySQL documentation, I found this paragraph:阅读 MySQL 文档,我发现了这一段:

Prior to MySQL 5.7.18, InnoDB processes SELECT COUNT(*) statements by scanning the clustered index.在 MySQL 5.7.18 之前,InnoDB 通过扫描聚集索引来处理SELECT COUNT(*)语句。 As of MySQL 5.7.18, InnoDB processes SELECT COUNT(*) statements by traversing the smallest available secondary index unless an index or optimizer hint directs the optimizer to use a different index.从 MySQL 5.7.18 开始,InnoDB 通过遍历最小的可用二级索引来处理SELECT COUNT(*)语句,除非索引或优化器提示指示优化器使用不同的索引。 If a secondary index is not present, the clustered index is scanned.如果二级索引不存在,则扫描聚集索引。

Ref: https://dev.mysql.com/doc/refman/5.7/en/group-by-functions.html#function_count参考: https://dev.mysql.com/doc/refman/5.7/en/group-by-functions.html#function_count

It meas that the count behavior changed exactly on the version you are using.这意味着计数行为在您使用的版本上完全改变。 It may explain the difference.它可以解释差异。

The optimal index is最优指数为

INDEX(COMPANY_ID, EVENT_DATETIME, EVENT_ID)  -- in this order

I appears that your date range is 1 year + one day + 1 second.我似乎您的日期范围是 1 年 + 一天 + 1 秒。 Was that deliberate?这是故意的吗?

If EVENT_ID is the PRIMARY KEY (Please provide SHOW CREATE TABLE ), then COUNT(DISTINCT EVENT_ID) could be simply COUNT(*) .如果EVENT_IDPRIMARY KEY (请提供SHOW CREATE TABLE ),那么COUNT(DISTINCT EVENT_ID)可能只是COUNT(*)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM