简体   繁体   中英

SQL to bulk identify certain records and then delete from SQLServer 2008

I have a weatherForecast table. Every hour around 700 postcodes worth of forecasts are inserted. These com in via three webServices. one at .40 past, .50 past and .55 past the hour.

This is all working fine except my code which traverses the XML and inserts it into our SQlServer 2008r2 DB has not been checking for duplicate post codes per hour. A duplicate is defined as a post code which is in more than one of the three webServices. So I have fixed this, My question is around how to deal with the records which have slipped through.

EG postCode 6330 was in the .40 and .50 mins past the hour feed therefore it has two forecasts for each hour (for a long period, the table is currently at 9 million recods). Now this is easy for me to remove these records. To keep the .40 mins past records and delete the .50 mins I think I can do this:

delete from Weather_Opticast_Forecast where lPostCode=6330 and datePart(minute, recordCreated) = '50'

Or can anyone see an issue with this. IMO I think its safe to say the .40 mins past records will stay and the .50 mins past recs will be deleted?

There are two postcodes we know about. What about the ones we dont. Do I have to write code to detect these or can SQL handle this? I want it to say, out of the 700 records (postcodes) which insert every hour is there any records for any postCode which have been inserted within a 60 minute period?

Hopefully I have explained my two questions clearly. It would be ideal to handle the identification and delete of these records with pure SQL

You can use ROW_NUMBER() within a CTE to identify the later duplicates within each hour and then remove them:

declare @t table (ID int not null,
                  CreatedAt datetime not null,
                  PostCode varchar(19) not null)
insert into @t (ID,CreatedAt,PostCode) values
(1,'2015-06-24T09:40:00',6884),
(2,'2015-06-24T09:51:00',6884),
(3,'2015-06-24T10:30:00',2117),
(4,'2015-06-24T10:30:01',2117),
(5,'2015-06-24T10:30:02',6884)

;With Selected as (
    select *,ROW_NUMBER() OVER (
                PARTITION BY
                            PostCode,
                            CONVERT(date,CreatedAt),
                            DATEPART(hour,CreatedAt)
                ORDER BY CreatedAt) as rn
    from @t
)
delete from Selected where rn > 1

select * from @t

Results:

ID          CreatedAt               PostCode
----------- ----------------------- -------------------
1           2015-06-24 09:40:00.000 6884
3           2015-06-24 10:30:00.000 2117
5           2015-06-24 10:30:02.000 6884

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM