简体   繁体   English

如何删除分区中具有特定值的尾随连续记录?

[英]How can I delete trailing contiguous records in a partition with a particular value?

I'm using the latest version of SQL Server and have the following problem.我正在使用最新版本的 SQL Server 并遇到以下问题。 Given the table below, the requirement, quite simply, is to delete "trailing" records in each _category partition that have _value = 0 .鉴于下表,要求非常简单,就是删除每个_category分区中_value = 0的“尾随”记录。 Trailing in this context means, when the records are placed in _date order, any series or contiguous block of records with _value = 0 at the end of the list should be deleted.在此上下文中的尾随意味着,当记录按_date顺序放置时,应删除列表末尾具有_value = 0的任何系列或连续记录块。 Records with _value = 0 that have subsequent records in the partition with some non-zero value should stay. _value = 0的记录应保留在分区中具有某些非零值的后续记录。

create table #x (_id int identity, _category int, _date date, _value int)

insert into #x values (1, '2022-10-01', 12)
insert into #x values (1, '2022-10-03', 0)
insert into #x values (1, '2022-10-04', 10)
insert into #x values (1, '2022-10-06', 11)
insert into #x values (1, '2022-10-07', 10)

insert into #x values (2, '2022-10-01', 1)
insert into #x values (2, '2022-10-02', 0)
insert into #x values (2, '2022-10-05', 19)
insert into #x values (2, '2022-10-10', 18)
insert into #x values (2, '2022-10-12', 0)
insert into #x values (2, '2022-10-13', 0)
insert into #x values (2, '2022-10-15', 0)

insert into #x values (3, '2022-10-02', 10)
insert into #x values (3, '2022-10-03', 0)
insert into #x values (3, '2022-10-05', 0)
insert into #x values (3, '2022-10-06', 12)
insert into #x values (3, '2022-10-08', 0)

I see a few ways to do it.我看到了几种方法。 The brute force way is to to run the records through a cursor in date order, and grab the ID of any record where _value = 0 and see if it holds until the category changes.蛮力方法是按日期顺序通过 cursor 运行记录,并获取_value = 0的任何记录的 ID,看看它是否有效,直到类别发生变化。 I'm trying to avoid T-SQL though if I can do it in a query.如果可以在查询中执行,我会尽量避免使用 T-SQL。

To that end, I thought I could apply some gaps and islands trickery and do something with window functions.为此,我想我可以应用一些间隙和岛屿技巧,并使用 window 函数做一些事情。 I feel like there might be a way to leverage last_value() for this, but so far I only see it useful in identifying partitions that have the criteria, not so much in helping me get the ID's of the records to delete.我觉得可能有一种方法可以为此利用last_value() ,但到目前为止,我认为它只对识别具有条件的分区很有用,而不是帮助我获取要删除的记录的 ID。

The desired result is the deletion of records 10, 11, 12 and 17.期望的结果是删除记录 10、11、12 和 17。

Appreciate any help.感谢任何帮助。

I'm not sure that your requirement requires a gaps and islands approach.我不确定您的要求是否需要差距和孤岛方法。 Simple exists logic should work.简单的存在逻辑应该有效。

SELECT _id, _catrgory, _date, _value
FROM #x x1
WHERE _value <> 0 OR
    EXISTS (
        SELECT 1
        FROM #x x2
        WHERE x2._category = x1._category AND
              x2._date > x1._date AND
              x2._value <> 0
    );

Assuming that all _value s are greater than or equal to 0 you can use MAX() window function in an updatable CTE :假设所有_value都大于或等于0 ,您可以在可更新的CTE中使用MAX() window function :

WITH cte AS (
  SELECT *, 
         MAX(_value) OVER (
           PARTITION BY _category 
           ORDER BY _date 
           ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING
         ) max
  FROM #x
)  
DELETE FROM cte
WHERE max = 0;

If there are negative _value s use MAX(ABS(_value)) instead of MAX(_value) .如果有负值_value使用MAX(ABS(_value))而不是MAX(_value)

See the demo .请参阅演示

Using common table expressions, you can use:使用公用表表达式,您可以使用:

WITH CTE_NumberedRows AS (
    SELECT *, rn = ROW_NUMBER() OVER(PARTITION BY _category ORDER BY _date)
    FROM #x
),
CTE_Keepers AS (
    SELECT _category, rnLastKeeper = MAX(rn)
    FROM CTE_NumberedRows
    WHERE _value <> 0
    GROUP BY _category
)
DELETE NR
FROM CTE_NumberedRows NR
LEFT JOIN CTE_Keepers K
    ON K._category = NR._category
WHERE NR.rn > ISNULL(K.rnLastKeeper, 0)

See this db<>fiddle for a working demo.有关工作演示,请参阅此 db<>fiddle

EDIT: My original post did not handle the all-zero's edge case.编辑:我原来的帖子没有处理全零的边缘情况。 This has been corrected above, together with some naming tweaks.上面已更正此问题,并进行了一些命名调整。 (The original can still be found here . (原件仍然可以在这里找到。

Tim Biegeleisen's post may be the simpler approach. Tim Biegeleisen 的帖子可能是更简单的方法。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM