简体   繁体   English

MYSQL:如何加快 sql 查询以获取数据

[英]MYSQL: how to speed up an sql Query for getting data

I am using Mysql database.我正在使用 Mysql 数据库。

I have a table daily_price_history of stock values stored with the following fields.我有一个表daily_price_history存储了以下字段的股票值。 It has 11 million+ rows它有11 million+

id
symbolName
symbolId
volume
high
low
open
datetime
close

So for each stock SymbolName there are various daily stock values.因此,对于每个股票SymbolName ,都有不同的每日股票价值。 And the data is now more than 11 million rows,而数据现在已经超过 1100 万行,

The following sql try to get the last 100 days of daily data for a set of 1500 symbols以下 sql 尝试获取一组 1500 个符号的最近 100 天的每日数据

SELECT `daily_price_history`.`id`,
       `daily_price_history`.`symbolId_id`,
       `daily_price_history`.`volume`,
       `daily_price_history`.`close`
FROM `daily_price_history`
WHERE (`daily_price_history`.`id` IN
         (SELECT U0.`id`
          FROM `daily_price_history` U0
          WHERE (U0.`symbolName` = `daily_price_history`.`symbolName`
                 AND U0.`datetime` >= 1598471533546))
       AND `daily_price_history`.`symbolName` IN (A,AA, ...... 1500 symbols Names)

I have the table indexed on symbolName and also datetime我在symbolNamedatetime上索引了表

For getting 130K (ie 1500 x 100 ~ 150000) rows of data it takes 20 secs.获取 130K(即 1500 x 100 ~ 150000)行数据需要 20 秒。

Also i have weekly_price_history and monthly_price_history tables, and I try to run the similar sql, they take less time for the same number (130K) of rows, because they have less data in the table than daily.我也有weekly_price_historymonthly_price_history表,我尝试运行类似的sql,它们在相同数量(130K)的行上花费的时间更少,因为它们在表中的数据比每天少。

weekly_price_history getting 150K rows takes 3s . weekly_price_history获得150K行需要3s The total number of rows in it are 2.5million其中总行数为2.5million

monthly_price_history getting 150K rows takes 1s . monthly_price_history获得150K行需要1s The total number of rows in it are 800K它的总行数是800K

So how to speed up the thing when the size of table is large.那么当表的大小很大时如何加快速度。

As a starter: I don't see the point for the subquery at all.作为初学者:我根本看不到子查询的意义。 Presumably, your query could filter directly in the where clause:据推测,您的查询可以直接在where子句中过滤:

select id, symbolid_id, volume, close
from daily_price_history
where datetime >= 1598471533546 and symbolname in ('A', 'AA', ...)

Then, you want an index on (datetime, symbolname) :然后,您需要(datetime, symbolname)上的索引:

create index idx_daily_price_history 
    on daily_price_history(datetime, symbolname)
;

The first column of the index matches on the predicate on datetime .索引的第一列与datetime上的谓词匹配。 It is not very likley, however, that the database will be able to use the index to filter symbolname against a large list of values.然而,不太可能数据库将能够使用索引来针对大量值列表过滤symbolname

An alternative would be to put the list of values in a table, say symbolnames .另一种方法是将值列表放入表中,例如symbolnames

create table symbolnames (
    symbolname varchar(50) primary key
);
insert into symbolnames values ('A'), ('AA'), ...; 

Then you can do:然后你可以这样做:

select p.id, p.symbolid_id, p.volume, p.close
from daily_price_history p
inner join symbolnames s on s.symbolname = p.symbolname
where s.datetime >= 1598471533546

That should allow the database to use the above index.那应该允许数据库使用上述索引。 We can take one step forward and try and add the 4 columns of the select clause to the index:我们可以向前迈出一步,尝试将select子句的 4 列添加到索引中:

create index idx_daily_price_history_2 
    on daily_price_history(datetime, symbolname, id, symbolid_id, volume, close)
;

When you add INDEX(a,b) , remove INDEX(a) as being no longer necessary.添加INDEX(a,b)时,删除不再需要INDEX(a)

Your dataset and query may be a case for using PARTITIONing .您的数据集和查询可能是使用PARTITIONing的一个案例。

PRIMARY KEY(symbolname, datetime)

PARTITION BY RANGE(datetime) ...

This will do "partition pruning": datetime >= 1598471533546 .这将执行“分区修剪”: datetime >= 1598471533546 Then the PRIMARY KEY will do most of the rest of the work for symbolname in ('A', 'AA', ...) .然后, PRIMARY KEY将完成symbolname in ('A', 'AA', ...)符号名的大部分 rest 工作。

Aim for about 50 partitions;瞄准大约50个分区; the exact number does not matter.确切的数字无关紧要。 Too many partitions may hurt performance;分区过多可能会影响性能; too few won't provide effective pruning.太少不会提供有效的修剪。

Yes, get rid of the subquery as GMB suggests.是的,按照 GMB 的建议去掉子查询。

Meanwhile, it sounds like Django is getting in the way.同时,听起来 Django 正在阻碍。

Some discussion of partitioning: http://mysql.rjweb.org/doc.php/partitionmaint一些关于分区的讨论: http://mysql.rjweb.org/doc.php/partitionmaint

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM