[英]SQL: Calculate the change in values over an n-day period
All,全部,
I have a table that looks like this:我有一个看起来像这样的表:
Date Pitcher WHIP
-------- -------------- -----
7/4/12 JACKSON, E 1.129
7/4/12 YOUNG, C 1.400
7/4/12 CORREIA, K 1.301
7/4/12 WOLF, R 1.594
...
6/28/12 JACKSON, E 1.137
6/27/12 YOUNG, C 1.750
...
6/19/12 JACKSON, E 1.215
6/17/12 YOUNG, C 1.851
I've set up a SQLFiddle here: http://sqlfiddle.com/#!2/addfe/1我在这里设置了一个 SQLFiddle: http ://sqlfiddle.com/#!2/addfe/1
In other words, the table lists the starting pitcher for every game of the MLB season, along with that pitcher's current WHIP (WHIP is a measure of the pitcher's performance).换句话说,该表列出了 MLB 赛季每场比赛的首发投手,以及该投手当前的 WHIP(WHIP 是衡量投手表现的指标)。
What I'd like to obtain from my query is this: how much has that pitcher's WHIP changed in the last 30 days?我想从我的查询中获得的是:在过去 30 天里,那个投手的 WHIP 变化了多少?
Or, more precisely, how much has that pitcher's WHIP changed since his most recent start that was at least 30 days ago?或者,更准确地说,该投手的 WHIP 自至少 30 天前的最近一次开始以来发生了多少变化?
So, for example, if E. Jackson's WHIP on 7/4/12 was 1.129, and his WHIP on 6/3/12 was 1.500, then I'd like to know that his WHIP changed by -0.371.因此,例如,如果 E. Jackson 在 2012 年 7 月 4 日的 WHIP 是 1.129,而他在 12 年 6 月 3 日的 WHIP 是 1.500,那么我想知道他的 WHIP 变化了 -0.371。
This is easy to figure out for any individual, but I want to calculate that for all pitchers, on all dates.这对任何人来说都很容易计算出来,但我想在所有日期为所有投手计算。
One of the things that makes this tricky is that there isn't data for every date.使这件事变得棘手的一件事是没有每个日期的数据。 For example, if E. Jackson pitched on 7/4/12, the most recent start that's at least 30 days ago might be on 5/28/2012.例如,如果 E. Jackson 在 2012 年 7 月 4 日投球,那么至少 30 天前的最近开始可能是 2012 年 5 月 28 日。
However, for K. Correia, who also pitched on 7/4/12 - his most recent start that's at least 30 days ago might be 5/26/2012.然而,对于同样在 2012 年 7 月 4 日投球的 K. Correia 而言,他最近一次开始至少是 30 天前可能是 2012 年 5 月 26 日。
I'm assuming that I need to join the table to itself, but I'm not sure how to do it.我假设我需要将表加入到自身中,但我不知道该怎么做。
Here's my first stab:这是我的第一个刺:
select
t1.home_pitcher,
t1.date,
t1.All_starts_whip,
t2.All_starts_whip
from
mlb_data t1
join
mlb_data t2
ON
t1.home_pitcher = t2.home_pitcher
and
t2.date = (select max(date) from mlb_data t3 where t3.home_pitcher = t1.home_pitcher and t3.date < date_sub(t1.date, interval 1 month))
This seems to work (and hopefully illustrates what I'm trying to capture), but takes HORRENDOUSLY long - my table goes back a few seasons, and has about 6,250 rows - and this query took 7,289 seconds (yes, that's correct - more than 2 hours).这似乎有效(并希望说明我想要捕捉的内容),但需要很长时间 - 我的表可以追溯到几个季节,并且有大约 6,250 行 - 这个查询需要 7,289 秒(是的,这是正确的 - 超过2小时)。 I'm sure this is a classic case of the absolute worst way to right a query.我敢肯定,这是一个典型的最糟糕的正确查询方式的案例。
[UPDATE] Some clarification... [更新]一些澄清......
The query should produce a value for EACH pitcher for EACH start.查询应该为每个开始的每个投手生成一个值。
In other words, if E. Jackson pitched in 10 games, he'd be listed in the result set 10 times.换句话说,如果 E. Jackson 在 10 场比赛中投球,他将被列入结果集 10 次。
Date Pitcher WHIP WHIP_30d_ago
-------- -------------- ----- ------------
7/4/12 JACKSON, E 1.129 1.111
...
5/18/12 JACKSON, E 1.111 2.222
...
4/14/12 JACKSON, E 2.222 3.333
In other words, I'm looking for a 30-day trailing WHIP for each start.换句话说,我正在为每次开始寻找一个 30 天的跟踪 WHIP。
Many thanks in advance!提前谢谢了!
I don't think you need a self join for that.. you can use a sub-query like this:我不认为你需要一个自我加入..你可以使用这样的子查询:
select
t1.home_pitcher,
t1.date,
t1.All_starts_whip,
(SELECT t2.all_starts_whip FROM mlb_data t2
WHERE
t2.date < date_sub(t1.date, interval 1 month)
AND t2.home_pitcher=t1.home_pitcher
ORDER BY t2.date DESC LIMIT 1) as previous_whip,
t1.all_starts_whip - previous_whip
FROM
mlb_data t1
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.