简体   繁体   中英

How can I delete trailing contiguous records in a partition with a particular value?

I'm using the latest version of SQL Server and have the following problem. Given the table below, the requirement, quite simply, is to delete "trailing" records in each _category partition that have _value = 0 . Trailing in this context means, when the records are placed in _date order, any series or contiguous block of records with _value = 0 at the end of the list should be deleted. Records with _value = 0 that have subsequent records in the partition with some non-zero value should stay.

create table #x (_id int identity, _category int, _date date, _value int)

insert into #x values (1, '2022-10-01', 12)
insert into #x values (1, '2022-10-03', 0)
insert into #x values (1, '2022-10-04', 10)
insert into #x values (1, '2022-10-06', 11)
insert into #x values (1, '2022-10-07', 10)

insert into #x values (2, '2022-10-01', 1)
insert into #x values (2, '2022-10-02', 0)
insert into #x values (2, '2022-10-05', 19)
insert into #x values (2, '2022-10-10', 18)
insert into #x values (2, '2022-10-12', 0)
insert into #x values (2, '2022-10-13', 0)
insert into #x values (2, '2022-10-15', 0)

insert into #x values (3, '2022-10-02', 10)
insert into #x values (3, '2022-10-03', 0)
insert into #x values (3, '2022-10-05', 0)
insert into #x values (3, '2022-10-06', 12)
insert into #x values (3, '2022-10-08', 0)

I see a few ways to do it. The brute force way is to to run the records through a cursor in date order, and grab the ID of any record where _value = 0 and see if it holds until the category changes. I'm trying to avoid T-SQL though if I can do it in a query.

To that end, I thought I could apply some gaps and islands trickery and do something with window functions. I feel like there might be a way to leverage last_value() for this, but so far I only see it useful in identifying partitions that have the criteria, not so much in helping me get the ID's of the records to delete.

The desired result is the deletion of records 10, 11, 12 and 17.

Appreciate any help.

I'm not sure that your requirement requires a gaps and islands approach. Simple exists logic should work.

SELECT _id, _catrgory, _date, _value
FROM #x x1
WHERE _value <> 0 OR
    EXISTS (
        SELECT 1
        FROM #x x2
        WHERE x2._category = x1._category AND
              x2._date > x1._date AND
              x2._value <> 0
    );

Assuming that all _value s are greater than or equal to 0 you can use MAX() window function in an updatable CTE :

WITH cte AS (
  SELECT *, 
         MAX(_value) OVER (
           PARTITION BY _category 
           ORDER BY _date 
           ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING
         ) max
  FROM #x
)  
DELETE FROM cte
WHERE max = 0;

If there are negative _value s use MAX(ABS(_value)) instead of MAX(_value) .

See the demo .

Using common table expressions, you can use:

WITH CTE_NumberedRows AS (
    SELECT *, rn = ROW_NUMBER() OVER(PARTITION BY _category ORDER BY _date)
    FROM #x
),
CTE_Keepers AS (
    SELECT _category, rnLastKeeper = MAX(rn)
    FROM CTE_NumberedRows
    WHERE _value <> 0
    GROUP BY _category
)
DELETE NR
FROM CTE_NumberedRows NR
LEFT JOIN CTE_Keepers K
    ON K._category = NR._category
WHERE NR.rn > ISNULL(K.rnLastKeeper, 0)

See this db<>fiddle for a working demo.

EDIT: My original post did not handle the all-zero's edge case. This has been corrected above, together with some naming tweaks. (The original can still be found here .

Tim Biegeleisen's post may be the simpler approach.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM