简体   繁体   中英

SQL performance issue

I have an old mysql query that I have performance problems with, as the number of items have grown suddenly. The main problem is probably blatantly obvious for a seasoned sql-expert, but not so for me unfortunately. The original query runs for at least 30 seconds:

SET SQL_BIG_SELECTS = 1;
SET @start_date = '2018-12-31';
SET @end_date = '2019-08-14';
SET @client_id = 16;

SELECT
    L.instrument_id as iid, 
    I.instrument as instrument, 
    X.portfolio as port, 
    T.ExcludeValue as exclude_value, 
    sum(if(L.trade_type=1 and L.trade_date<=@start_date,L.nominal*PS.price*I.scale_factor,0)) as val_start, 
    sum(if(L.trade_type=1 and L.trade_date<=@end_date,L.nominal*PE.price*I.scale_factor,0)) as val_end, 
    sum(if(L.nominal>0 and L.trade_type=1 and L.trade_date>@start_date and L.trade_date<=@end_date,-L.nominal*L.price*I.scale_factor,0)) as val_buy, 
    sum(if(L.nominal<0 and L.trade_type=1 and L.trade_date>@start_date and L.trade_date<=@end_date,-L.nominal*L.price*I.scale_factor,0)) as val_sell, 
    sum(if(L.trade_type>1 and L.trade_date>@start_date and L.trade_date<=@end_date,L.nominal*L.price*I.scale_factor,0)) as val_other 
FROM ledger L, instruments I, portfolios X, prices PS, prices PE, instrument_types T
WHERE
    L.instrument_id NOT IN (95) and 
    I.instrument_type_id=T.id and 
    I.id=L.instrument_id and 
    X.id=L.portfolio_id and 
    PS.instrument_id=L.instrument_id and 
    PE.instrument_id=L.instrument_id and 
    L.is_current=1 and 
    L.trade_status_id=2 and 
    L.client_id=@client_id and 
    L.trade_date<=@end_date and 
    PS.trade_date=(select min(trade_date) from prices where instrument_id=L.instrument_id and is_current=1 and trade_date>=@start_date and trade_date<=@end_date) and 
    PE.trade_date=(select max(trade_date) from prices where instrument_id=L.instrument_id and is_current=1 and trade_date>=@start_date and trade_date<=@end_date)
GROUP BY L.instrument_id, L.portfolio_id

This query is extremely slow. I'm guessing that the problem lies in the sub-selects of PS and PE. They are needed as the min/max trade_date are individual for each instrument.

If I split it up into 2 queries, the first runs in 3 ms:

SELECT
    L.instrument_id as iid, 
    I.instrument as instrument, 
    X.portfolio as port, 
    T.ExcludeValue as exclude_value, 
    sum(if(L.trade_type=1 and L.trade_date<=@start_date,L.nominal*I.scale_factor,0)) as val_start, 
    sum(if(L.trade_type=1 and L.trade_date<=@end_date,L.nominal*I.scale_factor,0)) as val_end, 
    sum(if(L.nominal>0 and L.trade_type=1 and L.trade_date>@start_date and L.trade_date<=@end_date,-L.nominal*L.price*I.scale_factor,0)) as val_buy, 
    sum(if(L.nominal<0 and L.trade_type=1 and L.trade_date>@start_date and L.trade_date<=@end_date,-L.nominal*L.price*I.scale_factor,0)) as val_sell, 
    sum(if(L.trade_type>1 and L.trade_date>@start_date and L.trade_date<=@end_date,L.nominal*L.price*I.scale_factor,0)) as val_other 
FROM ledger L, instruments I, portfolios X, prices PS, prices PE, instrument_types T
WHERE
    L.instrument_id NOT IN (95) and 
    I.instrument_type_id=T.id and 
    I.id=L.instrument_id and 
    X.id=L.portfolio_id and 
    L.is_current=1 and 
    L.trade_status_id=2 and 
    L.client_id=@client_id and 
    L.trade_date<=@end_date 
GROUP BY L.instrument_id, L.portfolio_id

and the second, an alternative price query, runs in 0.3 seconds:

SELECT d.instrument_id, d.dt_min as dS, ps.price as pS, d.dt_max as dE, pe.price as pE 
from (select p1.instrument_id, min(p1.trade_date) as dt_min, max(p1.trade_date) as dt_max from prices p1 where p1.is_current=1 and p1.trade_date>=@start_date and p1.trade_date<=@end_date group by p1.instrument_id) d
left join prices ps on ps.trade_date=d.dt_min
left join prices pe on pe.trade_date=d.dt_max
where ps.instrument_id=d.instrument_id and pe.instrument_id=d.instrument_id

Not sure how I can put the two together in the most efficient way, or even if there is some fundamental issue with them to begin with...any help appreciated.

You can convert the use of a correlated subquery such as:

PS.trade_date=(select min(trade_date) from prices where instrument_id=L.instrument_id and is_current=1 and trade_date>=@start_date and trade_date<=@end_date) 

into a non-correlated one like this:

(L.instrument_id, PS.trade_date) 
   IN (SELECT instrument_id, MIN(trade_date) 
       FROM prices 
       WHERE is_current=1 
           AND trade_date BETWEEN @start_date AND @end_date 
       GROUP BY instrument_id
   )

In the correlated version you are using, the subquery is executed independently for every outer result candidate; in the non-correlated one I present, the subquery is executed once and the outer query's candidate results are checked against that set.

In cases an the outer query returns few rows, and or the subquery data would be relatively huge if not correlated, correlated subqueries can be helpful; but in my experience, in most situations non-correlated versions usually perform much better.


Using explicit join syntax, makes the approach of using only one subquery more obvious:

...
INNER JOIN (
   SELECT instrument_id, MIN(trade_date) AS start_date, MAX(trade_date) AS end_date
   FROM prices 
   WHERE is_current=1 
       AND trade_date BETWEEN @start_date AND @end_date 
   GROUP BY instrument_id
) AS pb ON L.instrument_id = pb .instrument_id
INNER JOIN prices AS PS ON pb.instrument_id = PS.instrument_id AND pb.start_date = PS.trade_date
INNER JOIN prices AS PE ON pb.instrument_id = PE.instrument_id AND pb.end_date = PE.trade_date

Following Uueerdo leads, the query has become:

SET SQL_BIG_SELECTS = 1;
SET @start_date = '2018-12-31';
SET @end_date = '2019-08-14';
SET @client_id = 16;

SELECT
    L.instrument_id as iid, 
    I.instrument as instrument, 
    X.portfolio as port, 
    T.ExcludeValue as exclude_value, 
    sum(if(L.trade_type=1 and L.trade_date<=@start_date,L.nominal*PS.price*I.scale_factor,0)) as val_start, 
    sum(if(L.trade_type=1 and L.trade_date<=@end_date,L.nominal*PE.price*I.scale_factor,0)) as val_end, 
    sum(if(L.nominal>0 and L.trade_type=1 and L.trade_date>@start_date and L.trade_date<=@end_date,-L.nominal*L.price*I.scale_factor,0)) as val_buy, 
    sum(if(L.nominal<0 and L.trade_type=1 and L.trade_date>@start_date and L.trade_date<=@end_date,-L.nominal*L.price*I.scale_factor,0)) as val_sell, 
    sum(if(L.trade_type>1 and L.trade_date>@start_date and L.trade_date<=@end_date,L.nominal*L.price*I.scale_factor,0)) as val_other 
FROM ledger L
LEFT JOIN instruments I ON L.instrument_id=I.id
LEFT JOIN portfolios X ON L.portfolio_id=X.id
LEFT JOIN instrument_types T on I.instrument_type_id=T.id

INNER JOIN (
   SELECT instrument_id, MIN(trade_date) AS start_date, MAX(trade_date) AS end_date
   FROM prices 
   WHERE is_current=1 
       AND trade_date>=@start_date AND trade_date<=@end_date 
   GROUP BY instrument_id
) AS pb ON L.instrument_id = pb.instrument_id
INNER JOIN prices AS PS ON pb.instrument_id = PS.instrument_id AND pb.start_date = PS.trade_date
INNER JOIN prices AS PE ON pb.instrument_id = PE.instrument_id AND pb.end_date = PE.trade_date

It's many times faster now...

Thank you!!!

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM