简体   繁体   English

按连续日期分区

[英]Partition by consecutive dates

I have a table with two columns.我有一个包含两列的表。 X being the unique identifier. X 是唯一标识符。 I want to get the row number when I partition by column Y only if Z is in consecutive order.仅当 Z 是连续顺序时,我想在按 Y 列分区时获取行号。 For example, I have this table例如,我有这张桌子

   X    Y   Z 
   A    1   1-jan
   A    1   2-jan
   A    1   3-jan
   B    3   1-jan
   B    3   2-jan
   A    1   5-jan

The result should look like this:结果应如下所示:

   X    Y   Z      rn
   A    1   1-jan  1
   A    1   2-jan  2
   A    1   3-jan  3
   B    3   1-jan  1
   B    3   2-jan  2
   A    1   5-jan  1

The code I am using right now:我现在使用的代码:

  select X, Y, Z, ROW_NUMBER() over (partition by Y order by Z) as rn

I am getting this as my result (This is not the result I want):我得到这个结果(这不是我想要的结果):

   X    Y   Z      rn
   A    1   1-jan  1
   A    1   2-jan  2
   A    1   3-jan  3
   B    3   5-jan  1
   B    3   6-jan  2
   A    1   5-jan  4  <---- Column Z is not 4-Jan therefore it should be the not be row 4. It should be a new row 1

  

You first need to create data that can be used to partition your table.您首先需要创建可用于对表进行分区的数据。

The below uses LAG() to determine if a row is a "new partition", then SUM() OVER () to propagate that flag forward and make a "partition id", then finally uses ROW_NUMBER() with that identifier.下面使用 LAG() 来确定一行是否是“新分区”,然后 SUM() OVER () 向前传播该标志并生成“分区 ID”,最后使用带有该标识符的 ROW_NUMBER() 。

WITH
  gap_marker AS
(
  SELECT
    yourTable.*,
    IIF(
      LAG(z) OVER (PARTITION BY y ORDER BY z)
      =
      DATEADD(day, -1, z), 
      0,
      1
    )
      AS new_date_range
  FROM
    yourTable
), 
  date_range_partition AS
(
  SELECT
    gap_marker.*,
    SUM(new_date_range) OVER (PARTITION BY y ORDER BY z)   AS date_range_id
  FROM
    gap_marker
)
SELECT
  x, y, z,
  ROW_NUMBER() OVER (PARTITION BY y, date_range_id ORDER BY z)   AS rn
FROM
  date_range_partition

Alternatively, you could calculate an amount to deduct from the current rn , to reset to 1 when a date is skipped.或者,您可以计算从当前rn中扣除的金额,以便在跳过某个日期时重置为1

WITH
  enumerated AS
(
  SELECT
    yourTable.*,
    ROW_NUMBER() OVER (PARTITION BY y ORDER BY z)   AS rn,
    DATEDIFF(
      day,
      LAG(z) OVER (PARTITION BY y ORDER BY z),
      z
    )
      AS delta
  FROM
    yourTable
)
SELECT
  x, y, z,
  rn - MAX(IIF(delta = 1, 0, rn - 1)) OVER (PARTITION BY y ORDER BY z) AS rn
FROM
  enumerated

Finally, you could use DATEDIFF() if your rows are always whole days apart.最后,如果您的行总是相隔整整几天,您可以使用 DATEDIFF()。 Window functions can be used to work out what you should compare the current row against, and avoid ROW_NUMBER() altogether. Window 函数可用于计算出您应该将当前行与什么进行比较,并完全避免使用 ROW_NUMBER()。

WITH
  check_previous AS
(
  SELECT
    yourTable.*,
    IIF(
      LAG(z) OVER (PARTITION BY y ORDER BY z)
      =
      DATEADD(day, -1, z), 
      NULL,
      z
    )
      AS new_base_date
  FROM
    yourTable
)
SELECT
  x, y, z,
  DATEDIFF(
    day,
    MAX(new_base_date) OVER (PARTITION BY y ORDER BY z),
    z
  ) + 1
    AS rn
FROM
  check_previous

Demo of all three;三者的演示; https://dbfiddle.uk/K8x8gOqh https://dbfiddle.uk/K8x8gOqh

Supposing that column Z is a date column, you could try the following:假设 Z 列是日期列,您可以尝试以下操作:

SELECT X, Y, Z,
  ROW_NUMBER() OVER (PARTITION BY X, GRP ORDER BY Z) AS RN
FROM
(
  SELECT *,
    DATEDIFF(DAY, ROW_NUMBER() OVER (PARTITION BY X ORDER BY Z), Z) AS GRP
  FROM table_name
) T
ORDER BY X, Z

If the Z column datatype is not date, then you may generate the groups of consecutive values as the following:如果 Z 列数据类型不是日期,则可以生成连续值组,如下所示:

SELECT X, Y, Z,
  ROW_NUMBER() OVER (PARTITION BY X, GRP ORDER BY Z) AS RN
FROM
(
  SELECT *,
    CAST(SUBSTRING(Z, 0, CHARINDEX('-', Z)) AS INT) - 
     ROW_NUMBER() OVER (PARTITION BY X ORDER BY SUBSTRING(Z, CHARINDEX('-', Z)+1, LEN(Z)), CAST(SUBSTRING(Z, 0, CHARINDEX('-', Z)) AS INT)) AS GRP
  FROM table_name2
) T
ORDER BY X, MONTH(SUBSTRING(Z, CHARINDEX('-', Z)+1, LEN(Z))+' 1 1'), CAST(SUBSTRING(Z, 0, CHARINDEX('-', Z)) AS INT)

See a demo .看一个演示

I solved this problem using postgresql. Extract the logic and convert into your sql dialect.我使用 postgresql 解决了这个问题。提取逻辑并转换为您的 sql 方言。

DDL statement: DDL语句:

create table demo
(
x varchar(10) not null,
y int not null,
z date)

insert into demo(x,y,z) values
('A',1,'2022-01-01'),
('A',1,'2022-01-02'),
('A',1,'2022-01-03'),
('B',3,'2022-01-01'),
('B',3,'2022-01-02'),
('A',1,'2022-01-05');

query:询问:

with base_data as (
select x,y,z,
row_number() over(partition by x,y) as sno
from demo
)
,staging_data as  (  
select x,y,z, z - coalesce(lag(z) over(partition by x,y),z-1::INT) as diff
from base_data)
select 
x,y,z,row_number() over(partition by x,diff)
from staging_data

z-1::INT - instead use date_add(z,-1)- Hope this change will work in sqlserver z-1::INT - 而不是使用 date_add(z,-1) - 希望这个改变能在 sqlserver 中起作用

output: output:

x|y|z         |row_number|
-+-+----------+----------+
A|1|2022-01-01|         1|
A|1|2022-01-02|         2|
A|1|2022-01-03|         3|
A|1|2022-01-05|         1|
B|3|2022-01-01|         1|
B|3|2022-01-02|         2|

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM