简体   繁体   English

如何根据组内日期之间的差异更改列?

[英]How to change a column based on the difference between dates within a group?

This is probably a simple problem but I am quite a noob in SQL. I am using Impala.这可能是一个简单的问题,但我是 SQL 的菜鸟。我正在使用 Impala。 So I have data like this:所以我有这样的数据:

New_ID新ID Date日期 Old_ID旧ID
1 1个 2020-11-14 12:41:21 2020-11-14 12:41:21 0 0
1 1个 2020-11-14 12:50:40 2020-11-14 12:50:40 1 1个
2 2个 2020-10-14 15:22:00 2020-10-14 15:22:00 1.5 1.5
2 2个 2020-12-18 11:31:05 2020-12-18 11:31:05 2 2个
3 3个 2020-11-14 12:42:25 2020-11-14 12:42:25 3 3个

Assuming that I group by New_ID, I need to check that the difference between the date and the date immediately following it (if one exists) is less that 2 months (just gonna assume that's 60 days).假设我按 New_ID 分组,我需要检查日期和紧随其后的日期(如果存在)之间的差异是否小于 2 个月(假设是 60 天)。 If the difference is greater than 2 months then I need to change the New_ID to Old_ID.如果差异大于 2 个月,那么我需要将 New_ID 更改为 Old_ID。 If it's less than or equal to 2 months, then the New_ID can remain the same.如果小于或等于 2 个月,则 New_ID 可以保持不变。 Essentially, I would like the new table to look like this:本质上,我希望新表看起来像这样:

New_ID新ID Date日期 Old_ID旧ID
1 1个 2020-11-14 12:41:21 2020-11-14 12:41:21 0 0
1 1个 2020-11-14 12:50:40 2020-11-14 12:50:40 1 1个
1.5 1.5 2020-10-14 15:22:00 2020-10-14 15:22:00 1.5 1.5
2 2个 2020-12-18 11:31:05 2020-12-18 11:31:05 2 2个
3 3个 2020-11-14 12:42:25 2020-11-14 12:42:25 3 3个

I have tried this code snippit and variations of it, but 1. I am not sure how to handle null values and 2. I keep getting a syntax error 'could not resolve column/field reference 'day' '我已经尝试过此代码片段及其变体,但是 1. 我不确定如何处理 null 值和 2. 我不断收到语法错误“无法解析列/字段引用‘天’”

SELECT New_ID, Old_ID, Date,
LAG(Date) OVER(partition by New_ID ORDER BY Date) as previous_date,
case when datediff(day, previous_date, Date)/30.0 >= 2 then Old_ID
else New_ID end as 'new_identifier'
From MYTABLE;

Any pointers/suggestions would be greatly appreciated.任何指针/建议将不胜感激。

The Impala date function is months_between() -- and previous_date is not recognized so you need to repeat the expression: Impala 日期 function 是months_between() ——无法识别previous_date ,因此您需要重复表达式:

SELECT New_ID, Old_ID, Date,
       LAG(Date) OVER (partition by New_ID ORDER BY Date) as previous_date,
       (case when months_between(date, LAG(Date) OVER (partition by New_ID ORDER BY Date)) >= 2 then Old_ID
             else New_ID
         end) as new_identifier
From MYTABLE;

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM