简体   繁体   English

MySQL慢查询-使用Filesort

[英]MySQL Slow Query - Using Filesort

I am trying to run a query on my MySQL database which is taking 70+ seconds to run, and I'm scratching my head as to why the index isn't being used. 我正在尝试在我的MySQL数据库上运行一个查询,该查询需要70秒钟以上的时间才能运行,而我对于为什么不使用索引的问题scratch之以鼻。

Here's the query: 这是查询:

SELECT PriceId, InstrumentId, Date, Open, High, Low, Close, Volume, UnadjustedClose
FROM price
ORDER BY InstrumentId, Date DESC

The price table has an index with InstrumentId, Date (amongst other indexes). 价格表具有一个带有InstrumentId,Date的索引(以及其他索引)。 The table itself has 80 million rows, and is made up of 2 ints, a date, a long and 5 decimals. 该表本身有8000万行,由2个整数,一个日期,一个长整数和5个小数组成。

The explain command has type ALL, Null for possible keys, key and ref, and tells me the system is using filesort. 说明命令的类型为ALL,对于可能的键,键和引用,类型为Null,并告诉我系统正在使用文件排序。

Is this the best I can get from the system? 这是我从系统中得到的最好的吗? I expected the index to be used to make the sort faster. 我希望索引可以用来使排序更快。

Added: 添加:

Here's the table definition: 这是表的定义:

PriceId int PK, NN, AI
InstrumentId int NN
Date Date NN
Open Decimal(12,4)
High Decimal(12,4)
Low Decimal(12,4)
Close Decimal(12,4)
UnadjustedClose Decimal(12,4)
Volume BigInt

Indexes:

Primary -> PriceId
IX_InstrumentId -> InstrumentId
IX_Date -> Date
IX_InstrumentDate -> InstrumentId, Date

Explain output is: 解释输出为:

id: 1
select_type: Simple
table: price
type: ALL
possible_keys: NULL
key: NULL
key_len: NULL
ref: NULL
rows: 77926335
Extra: using filesort

The optimizer will not use the index, because you are retrieving all rows and the index does not contain all columns you are trying to get. 优化器将不使用索引,因为您正在检索所有行,并且索引不包含您尝试获取的所有列。 This means, the index is not a covering index. 这意味着该索引不是覆盖索引。

In most cases it is less effective to use the index and lookup for the records based on the index to retrieve the additional columns than to scan the whole table (when you are retrieving everything) 在大多数情况下,使用索引和基于索引的记录查找来检索额外的列要比扫描整个表(当您检索所有内容时)效率低。

You have some options: 您有一些选择:

  • Include all the necessary columns in your index: this requires more space and slows down the write operations. 在索引中包括所有必需的列:这需要更多空间并减慢写操作的速度。
  • Add a filter to the query based on the first column in the index. 根据索引的第一列向查询添加过滤器。 If the filter is selective enough (shrinks the required amount of rows to a reasonable level), the server will use your index. 如果筛选器具有足够的选择性(将所需的行数缩减到合理的水平),则服务器将使用您的索引。
  • Filter your data to a reasonable size 将数据过滤到合理大小
  • Do the sorting in the application 在应用程序中进行排序
  • Modify the primary key (the clustering) to (InstrumentID ASC, Date DESC) 将主键(群集)修改为(InstrumentID ASC, Date DESC)

EDIT More about the last option 编辑关于最后一个选项的更多信息

Your table looks like a log table. 您的表看起来像一个日志表。 In log tables it seems to be a good practice to add a unique integer ID to each records to eliminate duplications (but in most cases it is not). 在日志表中,向每个记录添加唯一的整数ID以消除重复似乎是一个好习惯(但在大多数情况下不是这样)。 However in most cases you do not use that ID. 但是,在大多数情况下,您不使用该ID。 In MySQL the primary key is the clustering key too (which means the data will be sorted in that order on the disk - more or less, now just forgive the fragmentation.) 在MySQL中,主键也是集群键(这意味着数据将在磁盘上以该顺序排序-或多或少,现在可以原谅碎片。)

In log tables it is a good idea to use the logged entity's ID and a timestamp (InstrumentID, Date in your case) as the clustered index (primary key in MySQL). 在日志表中,最好使用记录的实体的ID和时间戳(在您的情况下为InstrumentID,Date)作为聚簇索引(MySQL中的主键)。 When you do this, the order of your data will fit to the common business needs, which means the queries performance will be better. 当您这样做时,数据的顺序将适合常见的业务需求,这意味着查询性能会更好。

If the InstrumentID and Date is unique (I think it should be, an instrument can not have multiple prices in the same time, and it is really rare to change the price in less than a second), a composite index could be better. 如果InstrumentID和Date是唯一的(我认为应该是这样,那么一个工具不能同时具有多个价格,并且在不到一秒钟的时间内更改价格确实很少),那么复合索引可能会更好。 (and adds a better option to partition your table than the auto-generated integer values). (并且比自动生成的整数值增加了更好的分区表的选项)。

Side note: you can change the order of the columns in the PK if you are filtering or sorting by date more frequently than you do by the instrument ID. 旁注:如果您按日期进行过滤或排序的频率高于按仪器ID进行的频率,则可以更改PK中列的顺序。

END OF EDIT 编辑结束

Some questions you should answer in order to find a better way to achive your goal: 您应该回答一些问题,以找到实现目标的更好方法:

  • Why do you need to retrieve all the 80M records from the table? 为什么需要从表中检索所有80M条记录?
  • Does your application really use all of them? 您的应用程序真的使用所有这些吗?
  • If yes, is it possible to do the sorting in application level instead of database level? 如果是,是否可以在应用程序级别而不是数据库级别进行排序?
  • Really does the order of the records counts? 记录的顺序真的很重要吗?

You can't speed it up because of the large number of rows. 由于行数太多,因此无法加快速度。 Create a Materialized View from this query and once it is created, access will be faster. 从该查询创建一个Materialized View ,一旦创建,访问将更快。

MySQL doesn't support Materialized View , you can therefore implement it yourself using tutorial here . MySQL不支持Materialized View ,因此您可以使用此处的教程自己实现它。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM