[英]MYSQL: how to speed up an sql Query for getting data
I am using Mysql database.我正在使用 Mysql 数据库。
I have a table daily_price_history
of stock values stored with the following fields.我有一个表
daily_price_history
存储了以下字段的股票值。 It has 11 million+
rows它有
11 million+
行
id
symbolName
symbolId
volume
high
low
open
datetime
close
So for each stock SymbolName
there are various daily stock values.因此,对于每个股票
SymbolName
,都有不同的每日股票价值。 And the data is now more than 11 million rows,而数据现在已经超过 1100 万行,
The following sql try to get the last 100 days of daily data for a set of 1500 symbols以下 sql 尝试获取一组 1500 个符号的最近 100 天的每日数据
SELECT `daily_price_history`.`id`,
`daily_price_history`.`symbolId_id`,
`daily_price_history`.`volume`,
`daily_price_history`.`close`
FROM `daily_price_history`
WHERE (`daily_price_history`.`id` IN
(SELECT U0.`id`
FROM `daily_price_history` U0
WHERE (U0.`symbolName` = `daily_price_history`.`symbolName`
AND U0.`datetime` >= 1598471533546))
AND `daily_price_history`.`symbolName` IN (A,AA, ...... 1500 symbols Names)
I have the table indexed on symbolName
and also datetime
我在
symbolName
和datetime
上索引了表
For getting 130K (ie 1500 x 100 ~ 150000) rows of data it takes 20 secs.获取 130K(即 1500 x 100 ~ 150000)行数据需要 20 秒。
Also i have weekly_price_history
and monthly_price_history
tables, and I try to run the similar sql, they take less time for the same number (130K) of rows, because they have less data in the table than daily.我也有
weekly_price_history
和monthly_price_history
表,我尝试运行类似的sql,它们在相同数量(130K)的行上花费的时间更少,因为它们在表中的数据比每天少。
weekly_price_history
getting 150K
rows takes 3s
. weekly_price_history
获得150K
行需要3s
。 The total number of rows in it are 2.5million
其中总行数为
2.5million
monthly_price_history
getting 150K
rows takes 1s
. monthly_price_history
获得150K
行需要1s
。 The total number of rows in it are 800K
它的总行数是
800K
So how to speed up the thing when the size of table is large.那么当表的大小很大时如何加快速度。
As a starter: I don't see the point for the subquery at all.作为初学者:我根本看不到子查询的意义。 Presumably, your query could filter directly in the
where
clause:据推测,您的查询可以直接在
where
子句中过滤:
select id, symbolid_id, volume, close
from daily_price_history
where datetime >= 1598471533546 and symbolname in ('A', 'AA', ...)
Then, you want an index on (datetime, symbolname)
:然后,您需要
(datetime, symbolname)
上的索引:
create index idx_daily_price_history
on daily_price_history(datetime, symbolname)
;
The first column of the index matches on the predicate on datetime
.索引的第一列与
datetime
上的谓词匹配。 It is not very likley, however, that the database will be able to use the index to filter symbolname
against a large list of values.然而,不太可能数据库将能够使用索引来针对大量值列表过滤
symbolname
。
An alternative would be to put the list of values in a table, say symbolnames
.另一种方法是将值列表放入表中,例如
symbolnames
。
create table symbolnames (
symbolname varchar(50) primary key
);
insert into symbolnames values ('A'), ('AA'), ...;
Then you can do:然后你可以这样做:
select p.id, p.symbolid_id, p.volume, p.close
from daily_price_history p
inner join symbolnames s on s.symbolname = p.symbolname
where s.datetime >= 1598471533546
That should allow the database to use the above index.那应该允许数据库使用上述索引。 We can take one step forward and try and add the 4 columns of the
select
clause to the index:我们可以向前迈出一步,尝试将
select
子句的 4 列添加到索引中:
create index idx_daily_price_history_2
on daily_price_history(datetime, symbolname, id, symbolid_id, volume, close)
;
When you add INDEX(a,b)
, remove INDEX(a)
as being no longer necessary.添加
INDEX(a,b)
时,删除不再需要INDEX(a)
。
Your dataset and query may be a case for using PARTITIONing
.您的数据集和查询可能是使用
PARTITIONing
的一个案例。
PRIMARY KEY(symbolname, datetime)
PARTITION BY RANGE(datetime) ...
This will do "partition pruning": datetime >= 1598471533546
.这将执行“分区修剪”:
datetime >= 1598471533546
。 Then the PRIMARY KEY
will do most of the rest of the work for symbolname in ('A', 'AA', ...)
.然后,
PRIMARY KEY
将完成symbolname in ('A', 'AA', ...)
符号名的大部分 rest 工作。
Aim for about 50 partitions;瞄准大约50个分区; the exact number does not matter.
确切的数字无关紧要。 Too many partitions may hurt performance;
分区过多可能会影响性能; too few won't provide effective pruning.
太少不会提供有效的修剪。
Yes, get rid of the subquery as GMB suggests.是的,按照 GMB 的建议去掉子查询。
Meanwhile, it sounds like Django is getting in the way.同时,听起来 Django 正在阻碍。
Some discussion of partitioning: http://mysql.rjweb.org/doc.php/partitionmaint一些关于分区的讨论: http://mysql.rjweb.org/doc.php/partitionmaint
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.