[英]Partition by consecutive dates
我有一个包含两列的表。 X 是唯一标识符。 仅当 Z 是连续顺序时,我想在按 Y 列分区时获取行号。 例如,我有这张桌子
X Y Z
A 1 1-jan
A 1 2-jan
A 1 3-jan
B 3 1-jan
B 3 2-jan
A 1 5-jan
结果应如下所示:
X Y Z rn
A 1 1-jan 1
A 1 2-jan 2
A 1 3-jan 3
B 3 1-jan 1
B 3 2-jan 2
A 1 5-jan 1
我现在使用的代码:
select X, Y, Z, ROW_NUMBER() over (partition by Y order by Z) as rn
我得到这个结果(这不是我想要的结果):
X Y Z rn
A 1 1-jan 1
A 1 2-jan 2
A 1 3-jan 3
B 3 5-jan 1
B 3 6-jan 2
A 1 5-jan 4 <---- Column Z is not 4-Jan therefore it should be the not be row 4. It should be a new row 1
您首先需要创建可用于对表进行分区的数据。
下面使用 LAG() 来确定一行是否是“新分区”,然后 SUM() OVER () 向前传播该标志并生成“分区 ID”,最后使用带有该标识符的 ROW_NUMBER() 。
WITH
gap_marker AS
(
SELECT
yourTable.*,
IIF(
LAG(z) OVER (PARTITION BY y ORDER BY z)
=
DATEADD(day, -1, z),
0,
1
)
AS new_date_range
FROM
yourTable
),
date_range_partition AS
(
SELECT
gap_marker.*,
SUM(new_date_range) OVER (PARTITION BY y ORDER BY z) AS date_range_id
FROM
gap_marker
)
SELECT
x, y, z,
ROW_NUMBER() OVER (PARTITION BY y, date_range_id ORDER BY z) AS rn
FROM
date_range_partition
或者,您可以计算从当前rn
中扣除的金额,以便在跳过某个日期时重置为1
。
WITH
enumerated AS
(
SELECT
yourTable.*,
ROW_NUMBER() OVER (PARTITION BY y ORDER BY z) AS rn,
DATEDIFF(
day,
LAG(z) OVER (PARTITION BY y ORDER BY z),
z
)
AS delta
FROM
yourTable
)
SELECT
x, y, z,
rn - MAX(IIF(delta = 1, 0, rn - 1)) OVER (PARTITION BY y ORDER BY z) AS rn
FROM
enumerated
最后,如果您的行总是相隔整整几天,您可以使用 DATEDIFF()。 Window 函数可用于计算出您应该将当前行与什么进行比较,并完全避免使用 ROW_NUMBER()。
WITH
check_previous AS
(
SELECT
yourTable.*,
IIF(
LAG(z) OVER (PARTITION BY y ORDER BY z)
=
DATEADD(day, -1, z),
NULL,
z
)
AS new_base_date
FROM
yourTable
)
SELECT
x, y, z,
DATEDIFF(
day,
MAX(new_base_date) OVER (PARTITION BY y ORDER BY z),
z
) + 1
AS rn
FROM
check_previous
三者的演示; https://dbfiddle.uk/K8x8gOqh
假设 Z 列是日期列,您可以尝试以下操作:
SELECT X, Y, Z,
ROW_NUMBER() OVER (PARTITION BY X, GRP ORDER BY Z) AS RN
FROM
(
SELECT *,
DATEDIFF(DAY, ROW_NUMBER() OVER (PARTITION BY X ORDER BY Z), Z) AS GRP
FROM table_name
) T
ORDER BY X, Z
如果 Z 列数据类型不是日期,则可以生成连续值组,如下所示:
SELECT X, Y, Z,
ROW_NUMBER() OVER (PARTITION BY X, GRP ORDER BY Z) AS RN
FROM
(
SELECT *,
CAST(SUBSTRING(Z, 0, CHARINDEX('-', Z)) AS INT) -
ROW_NUMBER() OVER (PARTITION BY X ORDER BY SUBSTRING(Z, CHARINDEX('-', Z)+1, LEN(Z)), CAST(SUBSTRING(Z, 0, CHARINDEX('-', Z)) AS INT)) AS GRP
FROM table_name2
) T
ORDER BY X, MONTH(SUBSTRING(Z, CHARINDEX('-', Z)+1, LEN(Z))+' 1 1'), CAST(SUBSTRING(Z, 0, CHARINDEX('-', Z)) AS INT)
看一个演示。
我使用 postgresql 解决了这个问题。提取逻辑并转换为您的 sql 方言。
DDL语句:
create table demo
(
x varchar(10) not null,
y int not null,
z date)
insert into demo(x,y,z) values
('A',1,'2022-01-01'),
('A',1,'2022-01-02'),
('A',1,'2022-01-03'),
('B',3,'2022-01-01'),
('B',3,'2022-01-02'),
('A',1,'2022-01-05');
询问:
with base_data as (
select x,y,z,
row_number() over(partition by x,y) as sno
from demo
)
,staging_data as (
select x,y,z, z - coalesce(lag(z) over(partition by x,y),z-1::INT) as diff
from base_data)
select
x,y,z,row_number() over(partition by x,diff)
from staging_data
z-1::INT - 而不是使用 date_add(z,-1) - 希望这个改变能在 sqlserver 中起作用
output:
x|y|z |row_number|
-+-+----------+----------+
A|1|2022-01-01| 1|
A|1|2022-01-02| 2|
A|1|2022-01-03| 3|
A|1|2022-01-05| 1|
B|3|2022-01-01| 1|
B|3|2022-01-02| 2|
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.