简体   繁体   English

为什么MySQL优化器不使用所有列索引?

[英]Why MySQL optimizer doesn't use all columns index?

Percona MySQL 5.7 Percona MySQL 5.7

table scheeme: 表scheeme:

CREATE TABLE Developer.Rate (
  ID bigint(20) UNSIGNED NOT NULL AUTO_INCREMENT,
  TIME datetime NOT NULL,
  BASE varchar(3) NOT NULL,
  QUOTE varchar(3) NOT NULL,
  BID double NOT NULL,
  ASK double NOT NULL,
  PRIMARY KEY (ID),
  INDEX IDX_TIME (TIME),
  UNIQUE INDEX IDX_UK (BASE, QUOTE, TIME)
)
ENGINE = INNODB
ROW_FORMAT = COMPRESSED;

I try to make request for latests data before selected period. 我尝试在选定期间之前请求最新数据。 The optimazer use no-complete unique key, only 2 columns of 3. optimazer使用no-complete唯一键,只有2列3。

If I do request in common way: 如果我以共同的方式提出要求:

EXPLAIN FORMAT=JSON
SELECT
  BID
FROM 
  Rate
WHERE 
  BASE = 'EUR' 
  AND QUOTE = 'USD' 
  AND `TIME` <= (NOW() - INTERVAL 1 MONTH) 
ORDER BY 
  `TIME` DESC 
LIMIT 1
;

"Explain" shows that only 2 first columns of index are used: BASE, QUOTE “Explain”显示只使用了2个第一列索引:BASE,QUOTE

{
  "query_block": {
    "select_id": 1,
    "cost_info": {
      "query_cost": "10231052.40"
    },
    "ordering_operation": {
      "using_filesort": false,
      "table": {
        "table_name": "Rate",
        "access_type": "ref",
        "possible_keys": [
          "IDX_UK",
          "IDX_TIME"
        ],
        "key": "IDX_UK",
        "used_key_parts": [
          "BASE",
          "QUOTE"
        ],
        "key_length": "22",
        "ref": [
          "const",
          "const"
        ],
        "rows_examined_per_scan": 45966462,
        "rows_produced_per_join": 22983231,
        "filtered": "50.00",
        "cost_info": {
          "read_cost": "1037760.00",
          "eval_cost": "4596646.20",
          "prefix_cost": "10231052.40",
          "data_read_per_join": "1G"
        },
        "used_columns": [
          "ID",
          "TIME",
          "BASE",
          "QUOTE",
          "BID"
        ],
        "attached_condition": "((`Developer`.`Rate`.`BASE` <=> 'EUR') and (`Developer`.`Rate`.`QUOTE` <=> 'USD') and (`Developer`.`Rate`.`TIME` <= <cache>((now() - interval 1 month))))"
      }
    }
  }
}

But if you force the optimizer to use IDX_UK, MySQL uses all 3 columns in the request: 但是如果你强制优化器使用IDX_UK,MySQL会使用请求中的所有3列:

EXPLAIN FORMAT=JSON
SELECT
  BID
FROM 
  Rate FORCE INDEX(IDX_UK)
WHERE 
  BASE = 'EUR' 
  AND QUOTE = 'USD' 
  AND `TIME` <= (NOW() - INTERVAL 1 MONTH) 
ORDER BY 
  `TIME` DESC 
LIMIT 1

{
  "query_block": {
    "select_id": 1,
    "cost_info": {
      "query_cost": "10231052.40"
    },
    "ordering_operation": {
      "using_filesort": false,
      "table": {
        "table_name": "Rate",
        "access_type": "range",
        "possible_keys": [
          "IDX_UK"
        ],
        "key": "IDX_UK",
        "used_key_parts": [
          "BASE",
          "QUOTE",
          "TIME"
        ],
        "key_length": "27",
        "rows_examined_per_scan": 45966462,
        "rows_produced_per_join": 15320621,
        "filtered": "100.00",
        "index_condition": "((`Developer`.`Rate`.`BASE` = 'EUR') and (`Developer`.`Rate`.`QUOTE` = 'USD') and (`Developer`.`Rate`.`TIME` <= <cache>((now() - interval 1 month))))",
        "cost_info": {
          "read_cost": "1037760.00",
          "eval_cost": "3064124.31",
          "prefix_cost": "10231052.40",
          "data_read_per_join": "818M"
        },
        "used_columns": [
          "ID",
          "TIME",
          "BASE",
          "QUOTE",
          "BID"
        ]
      }
    }
  }
}

Why the optimizer don't use all 3 columns without explicit declaration of index? 为什么优化器在没有明确声明索引的情况下不使用所有3列?

Added: 添加:

A'm I understanding right, I should to use request like this? 我理解正确,我应该使用这样的请求吗?

Reuest example: Reuest示例:

EXPLAIN FORMAT=JSON
SELECT
  BID
FROM 
  Rate
WHERE 
  BASE = 'EUR' 
  AND QUOTE = 'USD' 
  AND `TIME` <= (NOW() - INTERVAL 1 MONTH) 
ORDER BY 
  BASE DESC, QUOTE DESC, TIME DESC
LIMIT 1

If I understand it right, the output of Explain vouldn't be better. 如果我理解正确,那么Explain的输出就不会更好。 There are still only 2 columns are used without TIME 仍然只有2列没有TIME使用

Explain Output 解释输出

{ "query_block": { "select_id": 1, "cost_info": { "query_cost": "10384642.20" }, "ordering_operation": { "using_filesort": false, "table": { "table_name": "Rate", "access_type": "ref", "possible_keys": [ "IDX_UK", "IDX_TIME" ], "key": "IDX_UK", "used_key_parts": [ "BASE", "QUOTE" ], "key_length": "22", "ref": [ "const", "const" ], "rows_examined_per_scan": 46734411, "rows_produced_per_join": 23367205, "filtered": "50.00", "index_condition": "(( Developer . Rate . BASE <=> 'EUR') and ( Developer . Rate . QUOTE <=> 'USD') and ( Developer . Rate . TIME <= ((now() - interval 1 month))))", "cost_info": { "read_cost": "1037760.00", "eval_cost": "4673441.10", "prefix_cost": "10384642.20", "data_read_per_join": "1G" }, "used_columns": [ "ID", "TIME", "BASE", "QUOTE", "BID" ] } } } }

Added 2: 新增2:

I made these 4 requests: 我做了这4个请求:

— 1 — - 1 -


<code>FLUSH STATUS;
SELECT
  BID
FROM 
  Rate
WHERE 
  BASE = 'EUR' 
  AND QUOTE = 'USD' 
  AND `TIME` <= (NOW() - INTERVAL 1 MONTH) 
LIMIT 1;
SHOW SESSION STATUS LIKE 'Handler%';</code>

— 2 — - 2 -

<code>FLUSH STATUS;
SELECT
  BID
FROM 
  Rate FORCE INDEX (IDX_UK)
WHERE 
  BASE = 'EUR' 
  AND QUOTE = 'USD' 
  AND `TIME` <= (NOW() - INTERVAL 1 MONTH) 
LIMIT 1;
SHOW SESSION STATUS LIKE 'Handler%';
</code>

— 3 — - 3 -

<code>FLUSH STATUS;
SELECT
  BID
FROM 
  Rate
WHERE 
  BASE = 'EUR' 
  AND QUOTE = 'USD' 
  AND `TIME` <= (NOW() - INTERVAL 1 MONTH) 
ORDER BY 
  `TIME` DESC 
LIMIT 1;
SHOW SESSION STATUS LIKE 'Handler%';</code>

— 4 — - 4 -

<code>
FLUSH STATUS;
SELECT
  BID
FROM 
  Rate FORCE INDEX (IDX_UK)
WHERE 
  BASE = 'EUR' 
  AND QUOTE = 'USD' 
  AND `TIME` <= (NOW() - INTERVAL 1 MONTH) 
ORDER BY 
  `TIME` DESC 
LIMIT 1;
SHOW SESSION STATUS LIKE 'Handler%';</code>

The output of session_status is the same in all requests except request 3. In output of request 3: Handler_read_prev = 486474; session_status的输出在除请求3之外的所有请求中都是相同的。在请求3的输出中:Handler_read_prev = 486474; In output of all ather requests: Handler_read_prev = 0; 在所有ather请求的输出中:Handler_read_prev = 0;

Handler_read_prev

Added 3: 补充3:

I made a copy of the table, removed Id field, promoted UNIQUE key as PRIMARY. 我制作了表的副本,删除了Id字段,将UNIQUE键提升为PRIMARY。

The scheme: 方案:

CREATE TABLE Developer.Rate2 (
  TIME datetime NOT NULL,
  BASE varchar(3) NOT NULL,
  QUOTE varchar(3) NOT NULL,
  BID double NOT NULL,
  ASK double NOT NULL,
  PRIMARY KEY (BASE, QUOTE, TIME),
  INDEX IDX_BID_ASK (BID, ASK)
)
ENGINE = INNODB
AVG_ROW_LENGTH = 26
CHARACTER SET utf8
COLLATE utf8_general_ci
ROW_FORMAT = COMPRESSED;

{
  "query_block": {
    "select_id": 1,
    "cost_info": {
      "query_cost": "9673452.20"
    },
    "ordering_operation": {
      "using_filesort": false,
      "table": {
        "table_name": "Rate2",
        "access_type": "range",
        "possible_keys": [
          "PRIMARY"
        ],
        "key": "PRIMARY",
        "used_key_parts": [
          "BASE",
          "QUOTE",
          "TIME"
        ],
        "key_length": "27",
        "rows_examined_per_scan": 48023345,
        "rows_produced_per_join": 16006180,
        "filtered": "100.00",
        "cost_info": {
          "read_cost": "68783.20",
          "eval_cost": "3201236.12",
          "prefix_cost": "9673452.20",
          "data_read_per_join": "732M"
        },
        "used_columns": [
          "TIME",
          "BASE",
          "QUOTE",
          "BID"
        ],
        "attached_condition": "((`Developer`.`Rate2`.`BASE` = 'EUR') and (`Developer`.`Rate2`.`QUOTE` = 'USD') and (`Developer`.`Rate2`.`TIME` <= <cache>((now() - interval 1 month))))"
      }
    }
  }
}

Now the request really works and Explain shows all 3 columns are used. 现在请求确实有效,Explain显示所有3列都被使用。 This variant works. 这种变体有效。

Get rid of ID , it is of no use. 摆脱ID ,没用。 Promote your UNIQUE key to be PRIMARY . 将您的UNIQUE键提升为PRIMARY Now, magically, the query will be faster, and the Question you posed will become moot. 现在,奇迹般地,查询会更快,你提出的问题将变得毫无意义。 (You may also need the DESC trick that lorraine suggested.) (您可能还需要洛林建议的DESC技巧。)

Here's another technique to compare performance: 这是另一种比较性能的技术:

FLUSH STATUS;
SELECT ...;
SHOW SESSION STATUS LIKE 'Handler%';

I would be interested to see the output from the SHOW for with and without the DESC trick. 我有兴趣看看SHOW的输出是否带有DESC技巧。 And with/without the FORCE INDEX you alluded to. 有/没有您提到的FORCE INDEX

Why faster? 为什么更快? Your query was using a secondary index, but it needed bid , which was not 'covered' by the index. 您的查询使用的是二级索引,但它需要bid ,而索引并未对其进行“覆盖”。 To get bid , the PRIMARY KEY needed to be drilled down in the 'data'. 要获得bid ,需要在'数据'中钻取PRIMARY KEY By changing it so that the PK is used, this extra drill-down is obviated. 通过更改它以便使用PK,可以避免这种额外的向下钻取。

The behavior you describe (ref access instead of range access over more columns) reminds me of Bug#81341 and Bug#87613 . 您描述的行为(ref访问而不是更多列的范围访问)让我想起了Bug#81341Bug#87613 These bugs were fixed in MySQL 5.7.17 and 5.7.21, respectively. 这些错误分别在MySQL 5.7.17和5.7.21中修复。 Which version are you using? 你使用的是哪个版本?

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM