[英]MySQL - Joining two tables with datetime columns on Date and three most recent entries before that date
I have two tables in my SQL 我的SQL中有两个表
For example Table1 - ItemPrice : 例如表1-ItemPrice :
DATETIME | ITEM | PRICE
2011-08-28 | ABC 123
2011-09-01 | ABC 125
2011-09-02 | ABC 124
2011-09-03 | ABC 127
2011-09-04 | ABC 126
Table2 - DayScore : 表2-DayScore :
DATETIME | ITEM | SCORE
2011-08-28 | ABC 1
2011-08-29 | ABC 8
2011-09-01 | ABC 4
2011-09-02 | ABC 2
2011-09-03 | ABC 7
2011-09-04 | ABC 3
I want to write a query, which given a item ID (eg ABC ), will return the price at that date from ItemPrice
(of there is no price for that date then the query should not return anything). 我想编写一个查询,给定一个商品ID(例如ABC ),它将从
ItemPrice
返回该日期的价格(该日期没有价格,则查询不应返回任何内容)。 If a valid price is found for the query date, the query should return (in 9 columns) 如果找到查询日期的有效价格,则查询应返回(9列)
ItemPrice
for the past three days (ie the most recent 3 prices before the date queried). ItemPrice
的商品价格(即查询日期之前的最新3个价格)。 DayScore
, the matching score for those 3 dates selected from ItemPrice. DayScore
返回从DayScore
选择的那三个日期的匹配分数。 In otherwords the results for this query looking at just date='2011-09-03' as an example for item='abc' would return: 换句话说,此查询的结果仅以date ='2011-09-03'为例(例如item ='abc')将返回:
DATE | ITEM | PRICE | SCR | PRC_t-1 | PRC_t-2 | PRC_t-3 | SCR_t-1 | SCR_t-2 | SCR_t-3 | DATE_t-1 | DATE_t-2 | DATE_t-3
2011-09-03| ABC | 127 | 7 | 124 | 125 | 123 | 2 | 4 | 1 | 2011-09-02| 2011-09-01| 2011-08-28
....
Etc for each date that appears in ItemPrice
table. 等等出现在
ItemPrice
表中的每个日期。
What is the neatest and most efficient way to run this query (as its something that will be run over many millions of rows)? 什么是最新颖,最有效的方法来运行此查询(因为它将在数百万行上运行)?
Cheers! 干杯!
Pretty no but it does produce the results. 完全没有,但是确实可以产生结果。 You could probably get rid of some subselects and make it a bit less sql but I tried to build it up in steps so you can deduct what it is doing.
您可能会摆脱一些子选择,使它的sql少一些,但我尝试逐步进行构建,以便可以推断出它在做什么。
The core part is this select: 核心部分是此选择:
SELECT
Sub2.*
, (Select MAX(IP3.DateTime) FROM ItemPrice IP3 where IP3.DateTime < T_2) AS T_3
FROM
(SELECT
Sub1.*
, (Select MAX(IP2.DateTime) FROM ItemPrice IP2 where IP2.DateTime < T_1) AS T_2
FROM
(SELECT
ItemPrice.DateTime
, (Select MAX(IP.DateTime) FROM ItemPrice IP where IP.DateTime < ItemPrice.DateTime) AS T_1
From ItemPrice) Sub1
) Sub2
This returns a table with the dates (now, t-1, t-2, t-3). 这将返回一个带有日期的表(现在是t-1,t-2,t-3)。 From there is is simple joining with price and score for each of those dates.
从那里可以轻松地将每个日期的价格和分数连接起来。 The whole things including testdata the becomes this bulk of sql
包括testdata在内的所有内容都变成了这大部分sql
/*
CREATE TABLE ItemPrice (datetime Date, item varchar(3), price int);
CREATE TABLE DayScore ( datetime Date, item varchar(3), score int);
INSERT INTO ItemPrice VALUES ('20110828', 'ABC', 123);
INSERT INTO ItemPrice VALUES ('20110901', 'ABC', 125);
INSERT INTO ItemPrice VALUES ('20110902', 'ABC', 124);
INSERT INTO ItemPrice VALUES ('20110903', 'ABC', 127);
INSERT INTO ItemPrice VALUES ('20110904', 'ABC', 126);
INSERT INTO DayScore VALUES ('20110828', 'ABC', 1);
INSERT INTO DayScore VALUES ('20110829', 'ABC', 8);
INSERT INTO DayScore VALUES ('20110901', 'ABC', 4);
INSERT INTO DayScore VALUES ('20110902', 'ABC', 2);
INSERT INTO DayScore VALUES ('20110903', 'ABC', 7);
INSERT INTO DayScore VALUES ('20110904', 'ABC', 3);
*/
SELECT Hist.*, Current.Item, Current.Price, Current.Score
, Minus1.Price as PRC_1, Minus1.Score SCR_1
, Minus2.Price as PRC_2, Minus2.Score SCR_2
, Minus3.Price as PRC_3, Minus3.Score SCR_3
FROM
(SELECT Sub2.*, (Select MAX(IP3.DateTime) FROM ItemPrice IP3 where IP3.DateTime < T_2) AS T_3
FROM
(SELECT Sub1.*, (Select MAX(IP2.DateTime) FROM ItemPrice IP2 where IP2.DateTime < T_1) AS T_2
FROM
(SELECT ItemPrice.DateTime, (Select MAX(IP.DateTime) FROM ItemPrice IP where IP.DateTime < ItemPrice.DateTime) AS T_1 From ItemPrice) Sub1) Sub2) Hist
INNER JOIN
(SELECT ItemPrice.DateTime, ItemPrice.Item, ItemPrice.Price, DayScore.Score FROM ItemPrice INNER JOIN DayScore ON (ItemPrice.Item = DayScore.Item AND ItemPrice.Datetime = DayScore.DateTime)) CURRENT
ON (Current.DateTime = Hist.DateTime)
LEFT JOIN
(SELECT ItemPrice.DateTime, ItemPrice.Price, DayScore.Score FROM ItemPrice INNER JOIN DayScore ON (ItemPrice.Item = DayScore.Item AND ItemPrice.Datetime = DayScore.DateTime)) MINUS1
ON (Minus1.DateTime = Hist.T_1)
LEFT JOIN
(SELECT ItemPrice.DateTime, ItemPrice.Price, DayScore.Score FROM ItemPrice INNER JOIN DayScore ON (ItemPrice.Item = DayScore.Item AND ItemPrice.Datetime = DayScore.DateTime)) MINUS2
ON (Minus2.DateTime = Hist.T_2)
LEFT JOIN
(SELECT ItemPrice.DateTime, ItemPrice.Price, DayScore.Score FROM ItemPrice INNER JOIN DayScore ON (ItemPrice.Item = DayScore.Item AND ItemPrice.Datetime = DayScore.DateTime)) MINUS3
ON (Minus3.DateTime = Hist.T_3)
WHERE Current.Item = 'ABC'
;
/*
DROP TABLE ItemPrice;
DROP TABLE DayScore;
*/
I'm curious about your explain plan when you do this on 1M rows :) It might not even be that horrible if you have the right indexes which you probably do. 当您对1M行执行此操作时,我对您的解释计划感到好奇:)如果您具有正确的索引(可能会执行),甚至可能还不那么可怕。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.