简体   繁体   English

为什么我的 SQL 查询不使用表的复合索引?

[英]Why is my SQL query not using the table's composite index?

I have a users table with the columns: id (primary key), type , external_id , external_type , created_at , updated_at我有一个包含以下列的users表: id (主键)、 typeexternal_idexternal_typecreated_atupdated_at

Indexes:索引:

  • Primary (id)主要(id)
  • Unique (external_id, external_type, type)唯一(external_id, external_type, type)
  • Non-unique (updated_at)非唯一(updated_at)

And a settings table with the columns: id , user_id , name , value , created_at , updated_at , type还有一个包含以下列的设置表: iduser_idnamevaluecreated_atupdated_attype

Indexes:索引:

  • Primary (id)主要(id)
  • Unique (user_id, name)唯一(user_id, name)
  • Non-unique (user_id)非唯一(user_id)
  • Non-unique (updated_at)非唯一(updated_at)

I execute the query:我执行查询:

SELECT users.id, users.type, users.external_id, users.created_at, users.updated_at,

  settings.id, settings.settings_id, settings.name, settings.value, 
  settings.created_at, settings.updated_at, settings.type

FROM users
  
  LEFT OUTER JOIN settings on settings.user_id = users.id

WHERE users.external_id=3 and users.external_type=“Owner”

In the Explain report, I see that:在解释报告中,我看到:

  • For the users table, the (external_id, external_type, type) index was identified as a possible key, but NOT used对于 users 表, (external_id, external_type, type) 索引被标识为可能的键,但未使用
  • The settings table uses the (user_id, name) index设置表使用 (user_id, name) 索引

Goal目标

  • I want to optimize this query我想优化这个查询
  • So I want to get the users table to use the (external_id, external_type, type) composite index所以我想让用户表使用 (external_id, external_type, type) 复合索引

Things I've done to debug:我为调试所做的事情:

  • If I change the first line of the SELECT statement to remove users.created_at, users.updated_at, it uses the index如果我更改 SELECT 语句的第一行以删除 users.created_at、users.updated_at,它使用索引
  • If I try adding a (external_id, external_type) non-unique index to the users table, it still doesn't use it如果我尝试向 users 表添加 (external_id, external_type) 非唯一索引,它仍然不使用它
  • If I change the query's WHERE clause to add and users.type=“Blah”, it uses the index如果我将查询的 WHERE 子句更改为添加且 users.type=“Blah”,它将使用索引

What am I missing?我错过了什么?

It is avoiding a double lookup它避免了双重查找

Your index is (external_id, external_type, type) , but in order to get all the information necessary for the query it would have to use that index to find the rows, then use the id that is automatically included at the end of that index to look up the created_at and updated_at columns from the main table.您的索引是(external_id, external_type, type) ,但是为了获取查询所需的所有信息,它必须使用该索引来查找行,然后使用自动包含在该索引末尾的id来从主表中查找created_atupdated_at列。

The optimizer makes the judgement that it would just be simpler to go straight to the main table to begin with, and so ignores the index.优化器判断直接从主表开始go会更简单,因此忽略索引。

You can see evidence of this fact with your statement:您可以通过您的陈述看到这一事实的证据:

If I change the first line of the SELECT statement to remove users.created_at, users.updated_at, it uses the index如果我更改 SELECT 语句的第一行以删除 users.created_at、users.updated_at,它使用索引

Once you remove those columns, it no longer has to do a double lookup to complete the query.删除这些列后,它不再需要进行双重查找来完成查询。 The single lookup from the index is what gets it to choose to use that index.索引中的单一查找是让它选择使用该索引的原因。

As for the following:至于以下:

If I change the query's WHERE clause to add and users.type=“Blah”, it uses the index如果我将查询的 WHERE 子句更改为添加且 users.type=“Blah”,它将使用索引

I would guess that the optimizer now thinks the double lookup is worth it, if it can reduce the rows enough with this more selective query.我猜优化器现在认为双重查找是值得的,如果它可以通过这种更具选择性的查询来减少行数。 Understanding the reasoning of the optimizer is not always easy, but this seems like the most obvious reason.理解优化器的推理并不总是那么容易,但这似乎是最明显的原因。

Solution解决方案

To get it to use the index, you just need to make it so it doesn't need to perform a double lookup by making it a covering index.要让它使用索引,您只需要使它不需要通过使其成为覆盖索引来执行双重查找。

(external_id,  external_type, type, created_at, updated_at)

This index will allow it to avoid the double lookup, as it can filter on the first columns, and then just use the remaining columns in the index to satisfy the SELECT for that table without having to jump back to the main table.该索引将允许它避免双重查找,因为它可以过滤第一列,然后只需使用索引中的其余列来满足该表的 SELECT 而不必跳回主表。

This answers the original version of the question.这回答了问题的原始版本。

You may be confusing the optimizer by using a LEFT JOIN and then filtering in the WHERE clause.通过使用LEFT JOIN然后在WHERE子句中进行过滤,您可能会混淆优化器。

Start by writing the query as:首先将查询编写为:

SELECT u.id, u.type, u.external_id, u.created_at, u.updated_at,
       s.id, s.settings_id, s.name, s.value, 
       s.created_at, s.updated_at, s.type
FROM users u JOIN
     settings s
     ON s.user_id = u.id
WHERE s.external_id = 3 and s.external_type = 'Owner'

The table aliases just make the query easier to write and read and don't affect performance.表别名只是使查询更易于编写和阅读,并且不影响性能。

Then, you want the following indexes:然后,您需要以下索引:

  • settings(external_id, external_type, user_id)
  • user(id)

MySQL should use the settings index to find the users that match the external_id and external_type by just looking them up in the index. MySQL 应该使用settings索引来查找与external_idexternal_type匹配的用户,只需在索引中查找即可。 It will then use the user_id to look up the corresponding information in the users table.然后它将使用user_idusers表中查找相应的信息。 This should be the fastest approach.这应该是最快的方法。

Actually, you get the second for free because it is the primary key.实际上,您可以免费获得第二个,因为它是主键。 I'm not bothering to create covering indexes, because you are selecting so many columns.我不费心创建覆盖索引,因为您选择了这么多列。 But that might provide marginally better performance.但这可能会提供稍微更好的性能。

Not sure what version of mysql you are using.不确定您使用的是什么版本的 mysql。 Before 8.0, mysql innodb does not persist the statistics, and the statistics in memory can hardly represent the data if your data is skewed. 8.0之前,mysql innodb不持久化统计数据,如果你的数据有偏差,memory的统计数据很难代表数据。 In your case, the query optimizer may think the table scan is the fastest if the statistics suggest most of the data in the table users with external_id = 3 and external_type = 'Owner' because no index on the table covers the columns being selected, and the query engine needs to do lookups for the data based on the index if index is used.在您的情况下,查询优化器可能认为表扫描是最快的,如果统计信息表明表 users 中的大部分数据为 external_id = 3 和 external_type = 'Owner',因为表上没有索引覆盖被选择的列,并且如果使用索引,查询引擎需要根据索引对数据进行查找。

When you change to SELECT the only columns from the index, the index becomes the covering index and the query engine will not need to do the lookup.当您将索引中的唯一列更改为 SELECT 时,索引将成为覆盖索引,查询引擎将不需要进行查找。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM