[英]Using a non-null conditional LAG() function in MySQL
I tried applying a couple solutions from here , but my question seems to be somewhat different from the OP's from this post.我尝试从这里应用几个解决方案,但我的问题似乎与这篇文章中的 OP 有所不同。
I have a large dataset data
in MySQL:我在 MySQL 中有一个大数据集
data
:
id date val
aaaaa 2021-01-01 TRUE
aaaaa 2021-01-02 FALSE
aaaaa 2021-01-03 FALSE
aaaaa 2021-01-04 TRUE
aaaaa 2021-01-05 FALSE
aaaaa 2021-01-06 TRUE
aaaaa 2021-01-07 FALSE
...
aaaaa 2021-12-31 FALSE
aaaab 2021-01-01 TRUE
aaaab 2021-01-02 FALSE
...
zzzzz 2021-12-31 FALSE
Here, id
is a string-type data, date
ranges from 2021-01-01
to 2021-12-31
without any missing days, and val
contains a boolean value, TRUE
or FALSE
.这里
id
是一个字符串类型的数据, date
范围是2021-01-01
到2021-12-31
,没有任何缺失的天数, val
是一个布尔值, TRUE
或FALSE
。 data
is ordered by id, date
. data
按id, date
排序。
I would like to add two columns, lagged_date
and date_diff
.我想添加两列,
lagged_date
和date_diff
。
lagged_date
contains the previous date of the id
where val = TRUE
. lagged_date
包含id
的前一个日期,其中val = TRUE
。date_diff
calculates the difference of the number of days between date
and lagged_date
in that row. date_diff
计算该行中date
和lagged_date
之间的天数差。 Ideally, my final dataset should look like this:理想情况下,我的最终数据集应如下所示:
id date val lagged_date date_diff
aaaaa 2021-01-01 TRUE NULL NULL
aaaaa 2021-01-02 FALSE 2021-01-01 1
aaaaa 2021-01-03 FALSE 2021-01-01 2
aaaaa 2021-01-04 TRUE 2021-01-01 3
aaaaa 2021-01-05 FALSE 2021-01-04 1
aaaaa 2021-01-06 TRUE 2021-01-04 2
aaaaa 2021-01-07 FALSE 2021-01-06 1
...
aaaaa 2021-12-31 FALSE 2021-12-25 6
aaaab 2021-01-01 TRUE NULL NULL
aaaab 2021-01-02 FALSE 2021-01-01 1
...
(Note that this data is also ordered by id, date
) (注意这个数据也是按
id, date
排序的)
I tried a following query:我尝试了以下查询:
SELECT *,
MAX(val) OVER (
PARTITION BY id, val
ORDER BY date
ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING
) AS lagged_date,
DATE_DIFF(date, lagged_date, DAY) AS date_diff
FROM data
but the lagged_date
does not produce my desired output, only produces the lagged val
.但是
lagged_date
不会产生我想要的输出,只会产生滞后的val
。 I tried MAX(date)
also, but to no avail.我也试过
MAX(date)
,但无济于事。
Any insight is appreciated.任何见解表示赞赏。
Use subquery to find the previous date for each 'id' where is 'TRUE' then use DATE_DIFF function.使用子查询查找每个 'id' 的前一个日期,其中 'TRUE' 然后使用 DATE_DIFF 函数。
SELECT t1.id, t1.date, t1.val,
(SELECT t2.date FROM data t2 WHERE t2.id = t1.id AND t2.val = TRUE AND t2.date < t1.date ORDER BY t2.date DESC LIMIT 1) as lagged_date,
DATEDIFF(t1.date, (SELECT t2.date FROM data t2 WHERE t2.id = t1.id AND t2.val = TRUE AND t2.date < t1.date ORDER BY t2.date DESC LIMIT 1)) as date_diff
FROM data t1
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.