简体   繁体   English

在 MySQL 中使用非空条件 LAG() 函数

[英]Using a non-null conditional LAG() function in MySQL

I tried applying a couple solutions from here , but my question seems to be somewhat different from the OP's from this post.我尝试从这里应用几个解决方案,但我的问题似乎与这篇文章中的 OP 有所不同。


I have a large dataset data in MySQL:我在 MySQL 中有一个大数据集data

id          date          val
aaaaa       2021-01-01    TRUE
aaaaa       2021-01-02    FALSE
aaaaa       2021-01-03    FALSE
aaaaa       2021-01-04    TRUE
aaaaa       2021-01-05    FALSE
aaaaa       2021-01-06    TRUE
aaaaa       2021-01-07    FALSE
...
aaaaa       2021-12-31    FALSE
aaaab       2021-01-01    TRUE
aaaab       2021-01-02    FALSE
...
zzzzz       2021-12-31    FALSE

Here, id is a string-type data, date ranges from 2021-01-01 to 2021-12-31 without any missing days, and val contains a boolean value, TRUE or FALSE .这里id是一个字符串类型的数据, date范围是2021-01-012021-12-31 ,没有任何缺失的天数, val是一个布尔值, TRUEFALSE data is ordered by id, date . dataid, date排序。

I would like to add two columns, lagged_date and date_diff .我想添加两列, lagged_datedate_diff

  • lagged_date contains the previous date of the id where val = TRUE . lagged_date包含id的前一个日期,其中val = TRUE
  • date_diff calculates the difference of the number of days between date and lagged_date in that row. date_diff计算该行中datelagged_date之间的天数差。

Ideally, my final dataset should look like this:理想情况下,我的最终数据集应如下所示:

id          date          val        lagged_date     date_diff
aaaaa       2021-01-01    TRUE       NULL            NULL
aaaaa       2021-01-02    FALSE      2021-01-01      1
aaaaa       2021-01-03    FALSE      2021-01-01      2
aaaaa       2021-01-04    TRUE       2021-01-01      3
aaaaa       2021-01-05    FALSE      2021-01-04      1
aaaaa       2021-01-06    TRUE       2021-01-04      2
aaaaa       2021-01-07    FALSE      2021-01-06      1
...
aaaaa       2021-12-31    FALSE      2021-12-25      6
aaaab       2021-01-01    TRUE       NULL            NULL
aaaab       2021-01-02    FALSE      2021-01-01      1
...

(Note that this data is also ordered by id, date ) (注意这个数据也是按id, date排序的)

I tried a following query:我尝试了以下查询:

SELECT *,
       MAX(val) OVER (
          PARTITION BY id, val
          ORDER BY date
          ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING
       ) AS lagged_date,
       DATE_DIFF(date, lagged_date, DAY) AS date_diff
  FROM data

but the lagged_date does not produce my desired output, only produces the lagged val .但是lagged_date不会产生我想要的输出,只会产生滞后的val I tried MAX(date) also, but to no avail.我也试过MAX(date) ,但无济于事。

Any insight is appreciated.任何见解表示赞赏。

Use subquery to find the previous date for each 'id' where is 'TRUE' then use DATE_DIFF function.使用子查询查找每个 'id' 的前一个日期,其中 'TRUE' 然后使用 DATE_DIFF 函数。

SELECT t1.id, t1.date, t1.val,
  (SELECT t2.date FROM data t2 WHERE t2.id = t1.id AND t2.val = TRUE AND t2.date < t1.date ORDER BY t2.date DESC LIMIT 1) as lagged_date,
  DATEDIFF(t1.date, (SELECT t2.date FROM data t2 WHERE t2.id = t1.id AND t2.val = TRUE AND t2.date < t1.date ORDER BY t2.date DESC LIMIT 1)) as date_diff
FROM data t1

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM