简体   繁体   中英

Query to delete records older than n active dates from each group

I have a dataset that I need to prune on a daily basis. It is populated from a process that writes records into a table periodically.

I currently have a simple query that does this:

DELETE FROM dataTable WHERE entryDate < dateadd(day, -5, GETDATE())

But the problem is that the process is unreliable; there may be days where no data is written at all.

So what I really need is a query that goes back 5 (possibly non-consecutive) days in which data is written, not 5 calendar days.

For example, if I run the following query:

SELECT cast(entryDate  as date) as LogDate
  FROM dataTable
  group by category, cast(entryDate as date)
  order by cast(entryDate as date) desc

I might get as a result:

Category    Date
Foo        2015-11-30
Foo        2015-11-29
Foo        2015-11-26
Foo        2015-11-25
Foo        2015-11-21
Foo        2015-11-20  <-- Start Pruning here, not the 25th.
Foo        2015-11-19
Foo        2015-11-18

Bar        2015-11-30
Bar        2015-11-29
Bar        2015-11-28
Bar        2015-11-27
Bar        2015-11-26
Bar        2015-11-25  <-- This one is OK to prune at the 25th.
Bar        2015-11-24
Bar        2015-11-23

I need the query to go all the way back to the 20th before it deletes.

You can use row_number to get the last 5 days when the table had an entry. Then delete based on the generated numbers.

SQL Fiddle

with rownums as (SELECT row_number() over(partition by category order by cast(entryDate as date) desc) as rn
                 ,*
                 FROM dataTable
)
delete from rownums where rn <= 5 --use > 5 for records prior to the last 5 days

Use dense_rank to number the rows if there can be multiple entries per day.

with rownums as (SELECT dense_rank() over(partition by category order by cast(entryDate as date) desc) as rn
                     ,*
                 FROM dataTable)
delete from rownums where rn > 5;

Try maybe something like this.

;WITH orderedDates (LogDate, RowNum)
AS 
(
SELECT [CACHEDATE] AS LogDate, ROW_NUMBER() OVER (ORDER BY CACHEDATE DESC) AS RowNum
FROM
dataTable
GROUP BY CACHEDATE
)
DELETE dataTable
WHERE CACHEDATE IN
(SELECT LogDate FROM orderedDates
WHERE ROWNUM > 5) --or however many you need to go back

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM