简体   繁体   中英

MySQL - Joining two tables with datetime columns on Date and three most recent entries before that date

I have two tables in my SQL

For example Table1 - ItemPrice :

DATETIME   |  ITEM | PRICE
2011-08-28 |   ABC   123 
2011-09-01 |   ABC   125
2011-09-02 |   ABC   124 
2011-09-03 |   ABC   127 
2011-09-04 |   ABC   126

Table2 - DayScore :

DATETIME   |  ITEM | SCORE
2011-08-28 |   ABC    1
2011-08-29 |   ABC    8
2011-09-01 |   ABC    4
2011-09-02 |   ABC    2
2011-09-03 |   ABC    7  
2011-09-04 |   ABC    3

I want to write a query, which given a item ID (eg ABC ), will return the price at that date from ItemPrice (of there is no price for that date then the query should not return anything). If a valid price is found for the query date, the query should return (in 9 columns)

  • the price of the item from ItemPrice for the past three days (ie the most recent 3 prices before the date queried).
  • In the next three columns it should return, from DayScore , the matching score for those 3 dates selected from ItemPrice.
  • Finally the dates (t-1 to t-3) selected

In otherwords the results for this query looking at just date='2011-09-03' as an example for item='abc' would return:

DATE      |  ITEM  |  PRICE  |  SCR  | PRC_t-1 | PRC_t-2 | PRC_t-3 | SCR_t-1 | SCR_t-2 | SCR_t-3 | DATE_t-1  | DATE_t-2  | DATE_t-3 
2011-09-03|  ABC   |  127    |  7    | 124     | 125     | 123     | 2       | 4       | 1       | 2011-09-02| 2011-09-01| 2011-08-28
....

Etc for each date that appears in ItemPrice table.

What is the neatest and most efficient way to run this query (as its something that will be run over many millions of rows)?

Cheers!

Pretty no but it does produce the results. You could probably get rid of some subselects and make it a bit less sql but I tried to build it up in steps so you can deduct what it is doing.

The core part is this select:

SELECT 
  Sub2.*
, (Select MAX(IP3.DateTime) FROM ItemPrice IP3 where IP3.DateTime < T_2) AS T_3
FROM
   (SELECT 
        Sub1.*
      , (Select MAX(IP2.DateTime) FROM ItemPrice IP2 where IP2.DateTime < T_1) AS T_2
    FROM
       (SELECT 
            ItemPrice.DateTime
          , (Select MAX(IP.DateTime) FROM ItemPrice IP where IP.DateTime < ItemPrice.DateTime) AS T_1 
        From ItemPrice) Sub1
   ) Sub2

This returns a table with the dates (now, t-1, t-2, t-3). From there is is simple joining with price and score for each of those dates. The whole things including testdata the becomes this bulk of sql

/*
CREATE TABLE ItemPrice (datetime Date, item varchar(3), price int);
CREATE TABLE DayScore ( datetime Date, item varchar(3), score int);

INSERT INTO ItemPrice VALUES ('20110828', 'ABC', 123);
INSERT INTO ItemPrice VALUES ('20110901', 'ABC', 125);
INSERT INTO ItemPrice VALUES ('20110902', 'ABC', 124);
INSERT INTO ItemPrice VALUES ('20110903', 'ABC', 127);
INSERT INTO ItemPrice VALUES ('20110904', 'ABC', 126);

INSERT INTO DayScore VALUES ('20110828', 'ABC', 1);
INSERT INTO DayScore VALUES ('20110829', 'ABC', 8);
INSERT INTO DayScore VALUES ('20110901', 'ABC', 4);
INSERT INTO DayScore VALUES ('20110902', 'ABC', 2);
INSERT INTO DayScore VALUES ('20110903', 'ABC', 7);
INSERT INTO DayScore VALUES ('20110904', 'ABC', 3);
*/

SELECT Hist.*, Current.Item, Current.Price, Current.Score
, Minus1.Price as PRC_1, Minus1.Score SCR_1
, Minus2.Price as PRC_2, Minus2.Score SCR_2
, Minus3.Price as PRC_3, Minus3.Score SCR_3
FROM 
    (SELECT Sub2.*, (Select MAX(IP3.DateTime) FROM ItemPrice IP3 where IP3.DateTime < T_2) AS T_3
    FROM
        (SELECT Sub1.*, (Select MAX(IP2.DateTime) FROM ItemPrice IP2 where IP2.DateTime < T_1) AS T_2
        FROM
            (SELECT ItemPrice.DateTime, (Select MAX(IP.DateTime) FROM ItemPrice IP where IP.DateTime < ItemPrice.DateTime) AS T_1 From ItemPrice) Sub1) Sub2) Hist 
INNER JOIN
    (SELECT ItemPrice.DateTime, ItemPrice.Item, ItemPrice.Price, DayScore.Score FROM ItemPrice INNER JOIN DayScore ON (ItemPrice.Item = DayScore.Item AND ItemPrice.Datetime = DayScore.DateTime)) CURRENT
ON (Current.DateTime = Hist.DateTime)        
LEFT JOIN 
    (SELECT ItemPrice.DateTime, ItemPrice.Price, DayScore.Score FROM ItemPrice INNER JOIN DayScore ON (ItemPrice.Item = DayScore.Item AND ItemPrice.Datetime = DayScore.DateTime)) MINUS1
ON (Minus1.DateTime = Hist.T_1)        
LEFT JOIN 
    (SELECT ItemPrice.DateTime, ItemPrice.Price, DayScore.Score FROM ItemPrice INNER JOIN DayScore ON (ItemPrice.Item = DayScore.Item AND ItemPrice.Datetime = DayScore.DateTime)) MINUS2
ON (Minus2.DateTime = Hist.T_2)        
LEFT JOIN 
    (SELECT ItemPrice.DateTime, ItemPrice.Price, DayScore.Score FROM ItemPrice INNER JOIN DayScore ON (ItemPrice.Item = DayScore.Item AND ItemPrice.Datetime = DayScore.DateTime)) MINUS3
ON (Minus3.DateTime = Hist.T_3)        
WHERE Current.Item = 'ABC'

;

/*
DROP TABLE ItemPrice;
DROP TABLE DayScore;
*/

I'm curious about your explain plan when you do this on 1M rows :) It might not even be that horrible if you have the right indexes which you probably do.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM