繁体   English   中英

按连续日期分区

[英]Partition by consecutive dates

我有一个包含两列的表。 X 是唯一标识符。 仅当 Z 是连续顺序时,我想在按 Y 列分区时获取行号。 例如,我有这张桌子

   X    Y   Z 
   A    1   1-jan
   A    1   2-jan
   A    1   3-jan
   B    3   1-jan
   B    3   2-jan
   A    1   5-jan

结果应如下所示:

   X    Y   Z      rn
   A    1   1-jan  1
   A    1   2-jan  2
   A    1   3-jan  3
   B    3   1-jan  1
   B    3   2-jan  2
   A    1   5-jan  1

我现在使用的代码:

  select X, Y, Z, ROW_NUMBER() over (partition by Y order by Z) as rn

我得到这个结果(这不是我想要的结果):

   X    Y   Z      rn
   A    1   1-jan  1
   A    1   2-jan  2
   A    1   3-jan  3
   B    3   5-jan  1
   B    3   6-jan  2
   A    1   5-jan  4  <---- Column Z is not 4-Jan therefore it should be the not be row 4. It should be a new row 1

  

您首先需要创建可用于对表进行分区的数据。

下面使用 LAG() 来确定一行是否是“新分区”,然后 SUM() OVER () 向前传播该标志并生成“分区 ID”,最后使用带有该标识符的 ROW_NUMBER() 。

WITH
  gap_marker AS
(
  SELECT
    yourTable.*,
    IIF(
      LAG(z) OVER (PARTITION BY y ORDER BY z)
      =
      DATEADD(day, -1, z), 
      0,
      1
    )
      AS new_date_range
  FROM
    yourTable
), 
  date_range_partition AS
(
  SELECT
    gap_marker.*,
    SUM(new_date_range) OVER (PARTITION BY y ORDER BY z)   AS date_range_id
  FROM
    gap_marker
)
SELECT
  x, y, z,
  ROW_NUMBER() OVER (PARTITION BY y, date_range_id ORDER BY z)   AS rn
FROM
  date_range_partition

或者,您可以计算从当前rn中扣除的金额,以便在跳过某个日期时重置为1

WITH
  enumerated AS
(
  SELECT
    yourTable.*,
    ROW_NUMBER() OVER (PARTITION BY y ORDER BY z)   AS rn,
    DATEDIFF(
      day,
      LAG(z) OVER (PARTITION BY y ORDER BY z),
      z
    )
      AS delta
  FROM
    yourTable
)
SELECT
  x, y, z,
  rn - MAX(IIF(delta = 1, 0, rn - 1)) OVER (PARTITION BY y ORDER BY z) AS rn
FROM
  enumerated

最后,如果您的行总是相隔整整几天,您可以使用 DATEDIFF()。 Window 函数可用于计算出您应该将当前行与什么进行比较,并完全避免使用 ROW_NUMBER()。

WITH
  check_previous AS
(
  SELECT
    yourTable.*,
    IIF(
      LAG(z) OVER (PARTITION BY y ORDER BY z)
      =
      DATEADD(day, -1, z), 
      NULL,
      z
    )
      AS new_base_date
  FROM
    yourTable
)
SELECT
  x, y, z,
  DATEDIFF(
    day,
    MAX(new_base_date) OVER (PARTITION BY y ORDER BY z),
    z
  ) + 1
    AS rn
FROM
  check_previous

三者的演示; https://dbfiddle.uk/K8x8gOqh

假设 Z 列是日期列,您可以尝试以下操作:

SELECT X, Y, Z,
  ROW_NUMBER() OVER (PARTITION BY X, GRP ORDER BY Z) AS RN
FROM
(
  SELECT *,
    DATEDIFF(DAY, ROW_NUMBER() OVER (PARTITION BY X ORDER BY Z), Z) AS GRP
  FROM table_name
) T
ORDER BY X, Z

如果 Z 列数据类型不是日期,则可以生成连续值组,如下所示:

SELECT X, Y, Z,
  ROW_NUMBER() OVER (PARTITION BY X, GRP ORDER BY Z) AS RN
FROM
(
  SELECT *,
    CAST(SUBSTRING(Z, 0, CHARINDEX('-', Z)) AS INT) - 
     ROW_NUMBER() OVER (PARTITION BY X ORDER BY SUBSTRING(Z, CHARINDEX('-', Z)+1, LEN(Z)), CAST(SUBSTRING(Z, 0, CHARINDEX('-', Z)) AS INT)) AS GRP
  FROM table_name2
) T
ORDER BY X, MONTH(SUBSTRING(Z, CHARINDEX('-', Z)+1, LEN(Z))+' 1 1'), CAST(SUBSTRING(Z, 0, CHARINDEX('-', Z)) AS INT)

看一个演示

我使用 postgresql 解决了这个问题。提取逻辑并转换为您的 sql 方言。

DDL语句:

create table demo
(
x varchar(10) not null,
y int not null,
z date)

insert into demo(x,y,z) values
('A',1,'2022-01-01'),
('A',1,'2022-01-02'),
('A',1,'2022-01-03'),
('B',3,'2022-01-01'),
('B',3,'2022-01-02'),
('A',1,'2022-01-05');

询问:

with base_data as (
select x,y,z,
row_number() over(partition by x,y) as sno
from demo
)
,staging_data as  (  
select x,y,z, z - coalesce(lag(z) over(partition by x,y),z-1::INT) as diff
from base_data)
select 
x,y,z,row_number() over(partition by x,diff)
from staging_data

z-1::INT - 而不是使用 date_add(z,-1) - 希望这个改变能在 sqlserver 中起作用

output:

x|y|z         |row_number|
-+-+----------+----------+
A|1|2022-01-01|         1|
A|1|2022-01-02|         2|
A|1|2022-01-03|         3|
A|1|2022-01-05|         1|
B|3|2022-01-01|         1|
B|3|2022-01-02|         2|

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM