I have some dirty resource usage records in t_resourcetable
which looks like this
resNo subres startdate enddate 1 2 2012-01-02 22:03:00.000 2012-01-03 00:00:00.000 1 2 2012-01-03 00:00:00.000 2012-01-04 00:00:00.000 1 2 2012-01-04 00:00:00.000 2012-01-04 16:23:00.000 1 3 2012-01-06 16:23:00.000 2012-01-06 22:23:00.000 2 2 2012-01-04 05:23:00.000 2012-01-06 16:23:00.000
I need those dirty rows to be merged in such way
resNo subres startdate enddate 1 2 2012-01-02 22:03:00.000 2012-01-04 16:23:00.000 1 3 2012-01-06 16:23:00.000 2012-01-06 22:23:00.000 2 2 2012-01-04 05:23:00.000 2012-01-06 16:23:00.000
This should get updated to the same table. I have more than 40k rows so cannot use a cursor. Please help me clean up this through more optimized sql statements.
Solution provided does not encounter the scenario like
resNo subres startdate enddate 1 2 2012-01-02 22:03:00.000 2012-01-03 00:00:00.000 1 2 2012-01-03 00:00:00.000 2012-01-04 00:00:00.000 1 2 2012-01-04 00:00:00.000 2012-01-04 16:23:00.000 1 2 2012-01-14 10:09:00.000 2012-01-15 00:00:00.000 1 2 2012-01-15 00:00:00.000 2012-01-16 00:00:00.000 1 2 2012-01-16 00:00:00.000 2012-01-16 03:00:00.000 1 3 2012-01-06 16:23:00.000 2012-01-06 22:23:00.000 2 2 2012-01-04 05:23:00.000 2012-01-06 16:23:00.000
I need those dirty rows to be merged in such way
resNo subres startdate enddate 1 2 2012-01-02 22:03:00.000 2012-01-04 16:23:00.000 1 2 2012-01-14 10:09:00.000 2012-01-16 03:00:00.000 1 3 2012-01-06 16:23:00.000 2012-01-06 22:23:00.000 2 2 2012-01-04 05:23:00.000 2012-01-06 16:23:00.000
Please assist me with this dirty data problem.
MERGE INTO t_resourcetable AS TARGET
USING (
SELECT
resNo, subres,
MIN(startdate) as startdate,
MAX(enddate) as enddate
FROM t_resourcetable
GROUP BY resNo, subres
) AS SOURCE
ON TARGET.resNo = SOURCE.resNo
AND TARGET.subres = SOURCE.subres
AND TARGET.startdate = SOURCE.startdate
-- Set enddate on the first record in the group
WHEN MATCHED THEN
UPDATE SET TARGET.enddate = SOURCE.enddate
-- Delete the remaining items
WHEN NOT MATCHED BY SOURCE THEN
DELETE;
Edit: To respect the gaps in the intervals:
MERGE INTO t_resourcetable AS TARGET
USING (
-- Find the first item in each interval group
SELECT
resNo, subres, startdate,
row_number() over (partition by resNo, subres order by startdate) as rn
FROM t_resourcetable t1
WHERE NOT EXISTS (
-- No other intervals that intersect this from behind
SELECT NULL
FROM t_resourcetable t2
WHERE t2.resNo = t1.resNo
AND t2.subres = t1.subres
AND t2.startdate < t1.startdate
AND t2.enddate >= t1.startdate
)
) AS SOURCE_start
INNER JOIN (
-- Find the last item in each interval group
SELECT
resNo, subres, enddate,
row_number() over (partition by resNo, subres order by startdate) as rn
FROM t_resourcetable t1
WHERE NOT EXISTS (
-- No other intervals that intersect this from ahead
SELECT NULL
FROM t_resourcetable t2
WHERE t2.resNo = t1.resNo
AND t2.subres = t1.subres
AND t2.startdate <= t1.enddate
AND t2.enddate > t1.enddate
)
) AS SOURCE_end
ON SOURCE_start.resNo = SOURCE_end.resNo
AND SOURCE_start.subres = SOURCE_end.subres
AND SOURCE_start.rn = SOURCE_end.rn -- Join by row number
ON TARGET.resNo = SOURCE_start.resNo
AND TARGET.subres = SOURCE_start.subres
AND TARGET.startdate = SOURCE_start.startdate
-- Set enddate on the first record in the group
WHEN MATCHED THEN
UPDATE SET TARGET.enddate = SOURCE_end.enddate
-- Delete the remaining items
WHEN NOT MATCHED BY SOURCE THEN
DELETE;
Result:
resNo subres startdate enddate
1 2 2012-01-02 22:03 2012-01-04 16:23
1 2 2012-01-14 10:09 2012-01-16 03:00
1 3 2012-01-06 16:23 2012-01-06 22:23
2 2 2012-01-04 05:23 2012-01-06 16:23
Edit: If there is any risk of concurrent edits on the target table, you might want to add the HOLDLOCK
hint. This will prevent any primary key violation errors, and be slighty more resource effective. (Thanks Joey):
MERGE INTO t_resourcetable WITH (HOLDLOCK) AS TARGET
...
For SQL Server 2005 you could do something like this:
create table #temp
(
resNo int,
subres int,
enddate datetime,
primary key (resNo, subres)
)
-- Store the values you need for enddate in a temp table
insert into #temp
select resNo,
subres,
max(enddate) as enddate
from t_resourcetable
group by resNo, subres
-- Delete duplicates keeping the row with min startdate
delete T
from (
select row_number() over(partition by resNo, subres order by startdate) as rn
from t_resourcetable
) as T
where rn > 1
-- Set enddate where needed
update T set enddate = tmp.enddate
from t_resourcetable as T
inner join #temp as tmp
on T.resNo = tmp.resNo and
t.subres = tmp.subres
where T.enddate <> tmp.enddate
drop table #temp
You could first store the result in a temporary table like this:
DECLARE @tmp TABLE
(
resNo INT,
subres INT,
startdate DATETIME,
enddate DATETIME
)
INSERT @tmp
SELECT resNo, subres, MIN(startdate), MAX(enddate)
FROM t_resourcetable
GROUP BY resNo, subres
To update t_resourcetable
table you could do this:
DELETE t_resourcetable
INSERT t_resourcetable
SELECT *
FROM @tmp
And run all of this in a transaction.
I would create a temp table. Now you can fill the temp table with the new and cleaned data. I think, you must make a combined key with resNo and subres and select min startdate and max enddate.
At least, delete all data in the old table and fill it with data from temp table.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.