[英]SQL/Teradata: Return records where value in consecutive rows is the same
I have a data set that looks like: 我有一个数据集,看起来像:
ID date emp_num loc
1111 5/2/16 111111 Brooklyn
1112 5/3/16 222222 Detroit
1113 5/3/16 333333 San Diego
1114 5/2/16 333333 Orlando
1115 5/5/16 333333 Brooklyn
1116 5/7/16 111111 Orlando
In this case, I would want to return records 1113, 1114, and 1115 because the emp_num in consecutive rows (ordered by ID) is the same. 在这种情况下,我要返回记录1113、1114和1115,因为连续行中的emp_num(按ID排序)是相同的。
I use Teradata, but if anyone has a SQL solution for another engine I can usually manage to translate it. 我使用Teradata,但是如果有人对另一个引擎有SQL解决方案,我通常可以设法对其进行翻译。
Thank you. 谢谢。
You need to look at the previous/next row and check if it didn't change: 您需要查看上一行/下一行,并检查它是否保持不变:
SELECT *
FROM tab
QUALIFY
MIN(emp_num) --previous row
OVER (ORDER BY ID
ROWS BETWEEN 1 PRECEDING AND 1 PRECEDING) = emp_num
OR
MIN(emp_num) -- next row
OVER (ORDER BY ID
ROWS BETWEEN 1 FOLLOWING AND 1 FOLLOWING) = emp_num
In Standard SQL this would be a task for LAG
/ LEAD
, but Teradata doesn't impement it, so you have to rewrite it. 在Standard SQL中,这是
LAG
/ LEAD
的任务,但是Teradata不会强制执行,因此您必须重写它。
First, get the rownumber difference ordered by id column and partitioned by emp_num and ordered by id column. 首先,获得按id列排序的行数差异,并按emp_num分区,并按id列排序。 This would classify emp_num into groups.
这会将emp_num分为几类。 Then, get the groups which have more than one member in them (which means there are consecutive rows with the same emp_num value).
然后,获取其中成员多于一个的组(这意味着连续的行具有相同的emp_num值)。 Finally select the required columns for those groups.
最后,为这些组选择所需的列。
WITH x AS (SELECT
*,
ROW_NUMBER() OVER (ORDER BY id) - ROW_NUMBER() OVER (PARTITION BY emp_num ORDER BY id) grp
FROM t),
grpsneeded
AS (SELECT
grp
FROM x
GROUP BY grp
HAVING COUNT(*) > 1)
SELECT
id,
dt,
emp_num
FROM x
WHERE grp IN (SELECT
grp
FROM grpsneeded)
This solution works well with SQL Server. 此解决方案可与SQL Server很好地配合使用。
A more simpler SQL solution would be using lead
and lag
functions. 一个更简单的SQL解决方案将使用
lead
和lag
函数。 As @dnoeth pointed out, Teradata doesn't support these functions. 正如@dnoeth指出的那样,Teradata不支持这些功能。 However, this may be useful for other database engines.
但是,这对于其他数据库引擎可能很有用。
select id, dt , emp_num from (
select *
,lead(emp_num) over(order by id) nxt
,lag(emp_num) over(order by id) prev
from t
) x
where coalesce(nxt,0) = emp_num or coalesce(prev,0) = emp_num
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.