简体   繁体   中英

MYSQL: how to speed up an sql Query for getting data

I am using Mysql database.

I have a table daily_price_history of stock values stored with the following fields. It has 11 million+ rows

id
symbolName
symbolId
volume
high
low
open
datetime
close

So for each stock SymbolName there are various daily stock values. And the data is now more than 11 million rows,

The following sql try to get the last 100 days of daily data for a set of 1500 symbols

SELECT `daily_price_history`.`id`,
       `daily_price_history`.`symbolId_id`,
       `daily_price_history`.`volume`,
       `daily_price_history`.`close`
FROM `daily_price_history`
WHERE (`daily_price_history`.`id` IN
         (SELECT U0.`id`
          FROM `daily_price_history` U0
          WHERE (U0.`symbolName` = `daily_price_history`.`symbolName`
                 AND U0.`datetime` >= 1598471533546))
       AND `daily_price_history`.`symbolName` IN (A,AA, ...... 1500 symbols Names)

I have the table indexed on symbolName and also datetime

For getting 130K (ie 1500 x 100 ~ 150000) rows of data it takes 20 secs.

Also i have weekly_price_history and monthly_price_history tables, and I try to run the similar sql, they take less time for the same number (130K) of rows, because they have less data in the table than daily.

weekly_price_history getting 150K rows takes 3s . The total number of rows in it are 2.5million

monthly_price_history getting 150K rows takes 1s . The total number of rows in it are 800K

So how to speed up the thing when the size of table is large.

As a starter: I don't see the point for the subquery at all. Presumably, your query could filter directly in the where clause:

select id, symbolid_id, volume, close
from daily_price_history
where datetime >= 1598471533546 and symbolname in ('A', 'AA', ...)

Then, you want an index on (datetime, symbolname) :

create index idx_daily_price_history 
    on daily_price_history(datetime, symbolname)
;

The first column of the index matches on the predicate on datetime . It is not very likley, however, that the database will be able to use the index to filter symbolname against a large list of values.

An alternative would be to put the list of values in a table, say symbolnames .

create table symbolnames (
    symbolname varchar(50) primary key
);
insert into symbolnames values ('A'), ('AA'), ...; 

Then you can do:

select p.id, p.symbolid_id, p.volume, p.close
from daily_price_history p
inner join symbolnames s on s.symbolname = p.symbolname
where s.datetime >= 1598471533546

That should allow the database to use the above index. We can take one step forward and try and add the 4 columns of the select clause to the index:

create index idx_daily_price_history_2 
    on daily_price_history(datetime, symbolname, id, symbolid_id, volume, close)
;

When you add INDEX(a,b) , remove INDEX(a) as being no longer necessary.

Your dataset and query may be a case for using PARTITIONing .

PRIMARY KEY(symbolname, datetime)

PARTITION BY RANGE(datetime) ...

This will do "partition pruning": datetime >= 1598471533546 . Then the PRIMARY KEY will do most of the rest of the work for symbolname in ('A', 'AA', ...) .

Aim for about 50 partitions; the exact number does not matter. Too many partitions may hurt performance; too few won't provide effective pruning.

Yes, get rid of the subquery as GMB suggests.

Meanwhile, it sounds like Django is getting in the way.

Some discussion of partitioning: http://mysql.rjweb.org/doc.php/partitionmaint

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM