[英]Trouble using ROW_NUMBER() OVER (PARTITION BY …)
我正在使用SQL Server 2008 R2。 我有一个名为EmployeeHistory的表,具有以下结构和示例数据:
EmployeeID Date DepartmentID SupervisorID
10001 20130101 001 10009
10001 20130909 001 10019
10001 20131201 002 10018
10001 20140501 002 10017
10001 20141001 001 10015
10001 20141201 001 10014
请注意,随着时间的推移,员工10001一直在更换2个部门和几个主管。 我要做的是在“日期”字段所排序的每个部门中列出该雇员的工作的开始和结束日期。 因此,输出将如下所示:
EmployeeID DateStart DateEnd DepartmentID
10001 20130101 20131201 001
10001 20131201 20141001 002
10001 20141001 NULL 001
我打算使用以下查询对数据进行分区,但失败了。 部门从001更改为002,然后又更改为001。显然,我无法按DepartmentID进行分区...我敢肯定,我忽略了明显的部分。 有什么帮助吗? 先感谢您。
SELECT * ,ROW_NUMBER() OVER (PARTITION BY EmployeeID, DepartmentID
ORDER BY [Date]) RN FROM EmployeeHistory
我会做这样的事情:
;WITH x
AS (SELECT *,
Row_number()
OVER(
partition BY employeeid
ORDER BY datestart) rn
FROM employeehistory)
SELECT *
FROM x x1
LEFT OUTER JOIN x x2
ON x1.rn = x2.rn + 1
或者可能是x2.rn-1.您必须看到。 无论如何,您都会想到。 一旦将表自身连接起来,就可以进行过滤,分组,排序等操作,以获取所需的内容。
有点涉及。 最简单的方法是参考我为您创建的SQL Fiddle ,它产生确切的结果。 出于性能或其他方面的考虑,可以通过多种方法对其进行改进,但是至少应该比某些替代方案更清楚一些。
要点是,首先要获得数据的规范排名,然后使用该规范将数据分为几组,然后为每组找到结束日期,然后消除任何中间行。 ROW_NUMBER()和CROSS APPLY在可读性方面有很大帮助。
编辑2019:
实际上,由于某些原因,SQL Fiddle确实确实坏了,但是在SQL Fiddle站点上似乎是一个问题。 这是一个完整的版本,刚刚在SQL Server 2016上进行了测试:
CREATE TABLE Source
(
EmployeeID int,
DateStarted date,
DepartmentID int
)
INSERT INTO Source
VALUES
(10001,'2013-01-01',001),
(10001,'2013-09-09',001),
(10001,'2013-12-01',002),
(10001,'2014-05-01',002),
(10001,'2014-10-01',001),
(10001,'2014-12-01',001)
SELECT *,
ROW_NUMBER() OVER (PARTITION BY EmployeeID ORDER BY DateStarted) AS EntryRank,
newid() as GroupKey,
CAST(NULL AS date) AS EndDate
INTO #RankedData
FROM Source
;
UPDATE #RankedData
SET GroupKey = beginDate.GroupKey
FROM #RankedData sup
CROSS APPLY
(
SELECT TOP 1 GroupKey
FROM #RankedData sub
WHERE sub.EmployeeID = sup.EmployeeID AND
sub.DepartmentID = sup.DepartmentID AND
NOT EXISTS
(
SELECT *
FROM #RankedData bot
WHERE bot.EmployeeID = sup.EmployeeID AND
bot.EntryRank BETWEEN sub.EntryRank AND sup.EntryRank AND
bot.DepartmentID <> sup.DepartmentID
)
ORDER BY DateStarted ASC
) beginDate (GroupKey);
UPDATE #RankedData
SET EndDate = nextGroup.DateStarted
FROM #RankedData sup
CROSS APPLY
(
SELECT TOP 1 DateStarted
FROM #RankedData sub
WHERE sub.EmployeeID = sup.EmployeeID AND
sub.DepartmentID <> sup.DepartmentID AND
sub.EntryRank > sup.EntryRank
ORDER BY EntryRank ASC
) nextGroup (DateStarted);
SELECT * FROM
(
SELECT *, ROW_NUMBER() OVER (PARTITION BY GroupKey ORDER BY EntryRank ASC) AS GroupRank FROM #RankedData
) FinalRanking
WHERE GroupRank = 1
ORDER BY EntryRank;
DROP TABLE #RankedData
DROP TABLE Source
它看起来像是一个常见的“缺岛”问题。 行号rn1
和rn2
两个序列之间的差给出了“组”号。
按CTE逐个运行此查询,并检查中间结果以了解其工作原理。
样本数据
我从问题中扩展了样本数据。
DECLARE @Source TABLE
(
EmployeeID int,
DateStarted date,
DepartmentID int
)
INSERT INTO @Source
VALUES
(10001,'2013-01-01',001),
(10001,'2013-09-09',001),
(10001,'2013-12-01',002),
(10001,'2014-05-01',002),
(10001,'2014-10-01',001),
(10001,'2014-12-01',001),
(10005,'2013-05-01',001),
(10005,'2013-11-09',001),
(10005,'2013-12-01',002),
(10005,'2014-10-01',001),
(10005,'2016-12-01',001);
查询SQL Server 2008
SQL Server 2008中没有LEAD
函数,因此我不得不通过OUTER APPLY
使用自DateEnd
来获取DateEnd
的“ next”行的DateEnd
。
WITH
CTE
AS
(
SELECT
EmployeeID
,DateStarted
,DepartmentID
,ROW_NUMBER() OVER (PARTITION BY EmployeeID ORDER BY DateStarted) AS rn1
,ROW_NUMBER() OVER (PARTITION BY EmployeeID, DepartmentID ORDER BY DateStarted) AS rn2
FROM @Source
)
,CTE_Groups
AS
(
SELECT
EmployeeID
,MIN(DateStarted) AS DateStart
,DepartmentID
FROM CTE
GROUP BY
EmployeeID
,DepartmentID
,rn1 - rn2
)
SELECT
CTE_Groups.EmployeeID
,CTE_Groups.DepartmentID
,CTE_Groups.DateStart
,A.DateEnd
FROM
CTE_Groups
OUTER APPLY
(
SELECT TOP(1) G2.DateStart AS DateEnd
FROM CTE_Groups AS G2
WHERE
G2.EmployeeID = CTE_Groups.EmployeeID
AND G2.DateStart > CTE_Groups.DateStart
ORDER BY G2.DateStart
) AS A
ORDER BY
EmployeeID
,DateStart
;
查询SQL Server 2012+
从SQL Server 2012开始,具有LEAD
功能,可使该任务更高效。
WITH
CTE
AS
(
SELECT
EmployeeID
,DateStarted
,DepartmentID
,ROW_NUMBER() OVER (PARTITION BY EmployeeID ORDER BY DateStarted) AS rn1
,ROW_NUMBER() OVER (PARTITION BY EmployeeID, DepartmentID ORDER BY DateStarted) AS rn2
FROM @Source
)
,CTE_Groups
AS
(
SELECT
EmployeeID
,MIN(DateStarted) AS DateStart
,DepartmentID
FROM CTE
GROUP BY
EmployeeID
,DepartmentID
,rn1 - rn2
)
SELECT
CTE_Groups.EmployeeID
,CTE_Groups.DepartmentID
,CTE_Groups.DateStart
,LEAD(CTE_Groups.DateStart) OVER (PARTITION BY CTE_Groups.EmployeeID ORDER BY CTE_Groups.DateStart) AS DateEnd
FROM
CTE_Groups
ORDER BY
EmployeeID
,DateStart
;
结果
+------------+--------------+------------+------------+
| EmployeeID | DepartmentID | DateStart | DateEnd |
+------------+--------------+------------+------------+
| 10001 | 1 | 2013-01-01 | 2013-12-01 |
| 10001 | 2 | 2013-12-01 | 2014-10-01 |
| 10001 | 1 | 2014-10-01 | NULL |
| 10005 | 1 | 2013-05-01 | 2013-12-01 |
| 10005 | 2 | 2013-12-01 | 2014-10-01 |
| 10005 | 1 | 2014-10-01 | NULL |
+------------+--------------+------------+------------+
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.