简体   繁体   English

使用复合索引优化 MySQL 查询

[英]Optimizing MySQL query with a composite index

I have a table which currently has about 80 million rows, created as follows:我有一个表,目前有大约 8000 万行,创建如下:

create table records
(
  id      int auto_increment primary key,
  created int             not null,
  status  int default '0' not null
)
  collate = utf8_unicode_ci;

create index created_and_status_idx
  on records (created, status);

The created column contains unix timestamps and status can be an integer between -10 and 10. The records are evenly distributed regarding the created date, and around half of them are of status 0 or -10. created 列包含 unix 时间戳,状态可以是 -10 到 10 之间的整数。记录根据创建日期均匀分布,其中大约一半的状态为 0 或 -10。

I have a cron that selects records that are between 32 and 8 days old, processes them and then deletes them, for certain statuses.我有一个 cron,它选择 32 到 8 天之间的记录,处理它们然后删除它们,用于某些状态。 The query is as follows:查询如下:

SELECT
    records.id
FROM records
WHERE
    (records.status = 0 OR records.status = -10)
    AND records.created BETWEEN UNIX_TIMESTAMP() - 32 * 86400 AND UNIX_TIMESTAMP() - 8 * 86400
LIMIT 500

The query was fast when the records were at the beginning of the creation interval, but now that the cleanup reaches the records at the end of interval it takes about 10 seconds to run.当记录处于创建间隔的开始时,查询速度很快,但现在清理到达间隔结束时的记录,运行大约需要 10 秒。 Explaining the query says it uses the index, but it parses about 40 million records.解释查询说它使用索引,但它解析了大约 4000 万条记录。

My question is if there is anything I can do to improve the performance of the query, and if so, how exactly.我的问题是我是否可以做任何事情来提高查询的性能,如果可以,具体如何。

Thank you.谢谢你。

I think union all is your best approach:我认为union all是你最好的方法:

(SELECT r.id
 FROM records r
 WHERE r.status = 0 AND
       r.created BETWEEN UNIX_TIMESTAMP() - 32 * 86400 AND UNIX_TIMESTAMP() - 8 * 86400
 LIMIT 500
) UNION ALL
(SELECT r.id
 FROM records r
 WHERE r.status = -10 AND
       r.created BETWEEN UNIX_TIMESTAMP() - 32 * 86400 AND UNIX_TIMESTAMP() - 8 * 86400
 LIMIT 500
) 
LIMIT 500;

This can use an index on records(status, created, id) .这可以使用records(status, created, id)上的索引。 Note: use union if records.id could have duplicates.注意:如果records.id可能有重复,请使用union

You are also using LIMIT with no ORDER BY .您还使用LIMIT而不使用ORDER BY That is generally discouraged.这通常是不鼓励的。

Your index is in the wrong order.您的索引顺序错误。 You should put the IN column ( status ) first (you phrased it as an OR ), and put the 'range' column ( created ) last:您应该将IN列( status )放在首位(您将其表述为OR ),然后将“范围”列( created )放在最后:

INDEX(status, created)

(Don't give me any guff about "cardinality"; we are not looking at individual columns.) (不要给我任何关于“基数”的废话;我们不是在查看单个列。)

Are there really only 3 columns in the table?表中真的只有 3 列吗? Do you need id ?你需要id吗? If not, get rid of it and change to如果没有,摆脱它并更改为

PRIMARY KEY(status, created)

Other techniques for walking through large tables efficiently.有效遍历大表的其他技术

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM