简体   繁体   English

当表很大时,mysql查询不使用索引

[英]mysql query doesn't use index when table is big

i have a table: 我有一张桌子:

CREATE TABLE `p` (  
`id` bigint(20) unsigned NOT NULL,  
`rtime` datetime NOT NULL,  
`d` int(10) NOT NULL,  
`n` int(10) NOT NULL,  
PRIMARY KEY (`rtime`,`id`,`d`) USING BTREE  
) ENGINE=MyISAM DEFAULT CHARSET=latin1;  

and i have a query: 我有一个问题:

select id, d, sum(n) from p where  rtime between '2012-08-25' and date(now()) group by id, d;

i'm running explain on this query on a tiny table (2 records) and it tells me it's going to use my PK: 我正在一个小桌子上运行这个查询的解释(2条记录),它告诉我它将使用我的PK:

id  | select_type  | table | type   | possible_keys key  | key     | key_len | ref  | rows | Extra
1   | SIMPLE       | p     | range  | PRIMARY            | PRIMARY | 8       | NULL | 1    | Using where; Using temporary; Using filesort

but when i use the same query on the same table - only this time it's huge (350 million records) - it prefers to go through all the records and ignore my keys 但是当我在同一张桌子上使用相同的查询时 - 只有这次它是巨大的(3.5亿条记录) - 它更喜欢浏览所有记录并忽略我的密钥

id  | select_type  | table  | type | possible_keys  | key  | key_len | ref  | rows      | Extra
1   | SIMPLE       | p      | ALL  | PRIMARY        | NULL | NULL    | NULL | 355465280 | Using where; Using temporary; Using filesort

obviously, this is extremely slow.. can anyone help? 很明显,这是非常缓慢..任何人都可以帮忙吗?

EDIT: this simple query is also taking a significant amount of time: 编辑:这个简单的查询也花了很多时间:

select count(*) from propagation_delay where  rtime > '2012-08-28';

Your query: 您的查询:

...WHERE rtime between '2012-08-25' and date(now()) group by id, d;

employs rtime, and groups by id and d. 使用rtime和id和d组。 At a minimum you ought to index by rtime . 至少你应该按rtime索引。 You might also want to try indexing by rtime, id, d, n in this order, but when you do, you see that your index will contain more or less the same data as your table. 您可能还希望按此顺序尝试按rtime, id, d, n进行索引,但是当您这样做时,您会看到您的索引将包含与表相同或更少的数据。

Probably, the optimizer does some calculations and comes to the conclusion that it's not really worthwhile to employ the index. 也许,优化器会进行一些计算,并得出结论:使用索引并不值得。

I'd leave an index on rtime alone. 我只留下rtime的索引。 The real clincher is how many records match the WHERE - if they're just a few, it is convenient to read the index and hop around the table. 真正的关键是有多少记录与WHERE匹配 - 如果它们只是少数几个,那么读取索引并在表格中跳转是很方便的。 If they're several, maybe it's better to sequentially scan the whole table, saving on the to-and-fro reads. 如果他们是几个,也许最好顺序扫描整个表,节省往返读取。

the query is getting a big chunk out of those 350 mil - i'd say a few millions 这个问题在350万美元中占了很大比重 - 我会说几百万

Okay, then it is likely that the cumulative cost of quickly extracting a half dozen million records from the index, and then shuttling to and fro from the main table to recover that half dozen million records, is more than the cost of opening the main table, and trawling through all 350M records grouping and summing along the way. 好吧,那很可能是从索引中快速提取半打记录的累积成本,然后从主表往返来恢复那半打的记录,超过了打开主表的成本,并拖曳所有350M记录分组和总结。

In such a scenario, if you always (or mostly) run aggregate queries on rtime , AND the table is an accumulating (historical) table, AND each couple (id, d) sees several scores of entries per day, you might consider creating an aggregate by date secondary table. 在这种情况下, 如果您总是(或大多数)在rtime上运行聚合查询,并且该表是一个累积(历史)表,并且每对(id, d)每天看到几个条目,您可以考虑创建一个按日期汇总次表。 Ie, at (say) midnight, you run a query and 即,在(比如说)午夜,你运行一个查询和

INSERT INTO aggregate_table
    SELECT DATE(@yesterday) AS rtime, id, d, sum(n) AS n
    FROM main_table WHERE DATE(rtime) = @yesterday GROUP BY id, d;

The data in aggregate_table has one entry only per each couple (id, d) holding the sum on n for that day; aggregate_table的数据只有每对夫妇(id, d)有一个条目,当天持有n的总和; the table is proportionately smaller, and queries faster. 表格按比例缩小,查询速度更快。 This assumes that you have a comparatively small number of (id, d) and each of them generates lots of rows in the main table each day. 这假设您的(id, d)数量相对较少(id, d)并且每个都在主表中每天生成大量行。

With one logging per minute per couple, aggregation should speed up things by more than three orders of magnitude (conversely, if you have the twice-daily take of a huge number of different sensors, the benefits will be negligible). 每对夫妇每分钟一次记录,聚合应该可以将速度提高三个数量级以上(相反,如果你每天两次采用大量不同的传感器,那么效益可以忽略不计)。

In your second query, the date range was going to return so many rows that MySQL decided not to use the index. 在第二个查询中,日期范围将返回MySQL决定不使用索引的那么多行。 It did this because n is not included in the index. 这样做是因为n未包含在索引中。 A non-covering index is still a lookup, and doing a high number of lookups is slower than scanning the table. 非覆盖索引仍然是查找,并且执行大量查找比扫描表更慢。

In order to utilize an index, you'll need to reduce the number of selected rows, or include n in your index to have a full "covering" index. 为了利用索引,您需要减少所选行的数量,或者在索引中包含n以获得完整的“覆盖”索引。

您可能让MySQL使用索引提示语法的某个索引。

Just a hunch, with some little experience in the back, try changing the engine from MyISAM to InnoDB. 只是预感,在后面有一点经验,尝试将引擎从MyISAM更改为InnoDB。 MyISAM has some problems with many recordings and other bugs and InnoDB is now better. MyISAM在许多录音和其他错误方面存在一些问题,InnoDB现在更好了。 Also, as of MySQL 5.5 the default engine is InnoDB : http://dev.mysql.com/doc/refman/5.5/en/innodb-default-se.html 此外,从MySQL 5.5开始,默认引擎是InnoDB: http//dev.mysql.com/doc/refman/5.5/en/innodb-default-se.html

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM