简体   繁体   中英

Merging records from multiple rows in table sql server

I have some dirty resource usage records in t_resourcetable which looks like this

resNo   subres    startdate                        enddate
1        2        2012-01-02 22:03:00.000          2012-01-03 00:00:00.000
1        2        2012-01-03 00:00:00.000          2012-01-04 00:00:00.000
1        2        2012-01-04 00:00:00.000          2012-01-04 16:23:00.000
1        3        2012-01-06 16:23:00.000          2012-01-06 22:23:00.000
2        2        2012-01-04 05:23:00.000          2012-01-06 16:23:00.000

I need those dirty rows to be merged in such way

resNo   subres    startdate                        enddate
1        2        2012-01-02 22:03:00.000          2012-01-04 16:23:00.000
1        3        2012-01-06 16:23:00.000          2012-01-06 22:23:00.000
2        2        2012-01-04 05:23:00.000          2012-01-06 16:23:00.000

This should get updated to the same table. I have more than 40k rows so cannot use a cursor. Please help me clean up this through more optimized sql statements.

Solution provided does not encounter the scenario like

resNo   subres    startdate                        enddate
1        2        2012-01-02 22:03:00.000          2012-01-03 00:00:00.000
1        2        2012-01-03 00:00:00.000          2012-01-04 00:00:00.000
1        2        2012-01-04 00:00:00.000          2012-01-04 16:23:00.000
1        2        2012-01-14 10:09:00.000          2012-01-15 00:00:00.000
1        2        2012-01-15 00:00:00.000          2012-01-16 00:00:00.000
1        2        2012-01-16 00:00:00.000          2012-01-16 03:00:00.000
1        3        2012-01-06 16:23:00.000          2012-01-06 22:23:00.000
2        2        2012-01-04 05:23:00.000          2012-01-06 16:23:00.000

I need those dirty rows to be merged in such way

resNo   subres    startdate                        enddate
1        2        2012-01-02 22:03:00.000          2012-01-04 16:23:00.000
1        2        2012-01-14 10:09:00.000          2012-01-16 03:00:00.000
1        3        2012-01-06 16:23:00.000          2012-01-06 22:23:00.000
2        2        2012-01-04 05:23:00.000          2012-01-06 16:23:00.000

Please assist me with this dirty data problem.

MERGE INTO t_resourcetable AS TARGET
USING (
    SELECT
        resNo, subres,
        MIN(startdate) as startdate,
        MAX(enddate) as enddate
    FROM t_resourcetable
    GROUP BY resNo, subres
) AS SOURCE
ON TARGET.resNo = SOURCE.resNo
AND TARGET.subres = SOURCE.subres
AND TARGET.startdate = SOURCE.startdate
-- Set enddate on the first record in the group
WHEN MATCHED THEN
    UPDATE SET TARGET.enddate = SOURCE.enddate
-- Delete the remaining items
WHEN NOT MATCHED BY SOURCE THEN
    DELETE;

Edit: To respect the gaps in the intervals:

MERGE INTO t_resourcetable AS TARGET
USING (
    -- Find the first item in each interval group
    SELECT
        resNo, subres, startdate,
        row_number() over (partition by resNo, subres order by startdate) as rn
    FROM t_resourcetable t1
    WHERE NOT EXISTS (
        -- No other intervals that intersect this from behind
        SELECT NULL
        FROM t_resourcetable t2
        WHERE t2.resNo = t1.resNo
        AND t2.subres = t1.subres
        AND t2.startdate < t1.startdate
        AND t2.enddate >= t1.startdate
    )
) AS SOURCE_start
INNER JOIN (
    -- Find the last item in each interval group
    SELECT
        resNo, subres, enddate,
        row_number() over (partition by resNo, subres order by startdate) as rn
    FROM t_resourcetable t1
    WHERE NOT EXISTS (
        -- No other intervals that intersect this from ahead
        SELECT NULL
        FROM t_resourcetable t2
        WHERE t2.resNo = t1.resNo
        AND t2.subres = t1.subres
        AND t2.startdate <= t1.enddate
        AND t2.enddate > t1.enddate
    )
) AS SOURCE_end
    ON SOURCE_start.resNo = SOURCE_end.resNo
    AND SOURCE_start.subres = SOURCE_end.subres
    AND SOURCE_start.rn = SOURCE_end.rn -- Join by row number
ON TARGET.resNo = SOURCE_start.resNo
AND TARGET.subres = SOURCE_start.subres
AND TARGET.startdate = SOURCE_start.startdate
-- Set enddate on the first record in the group
WHEN MATCHED THEN
    UPDATE SET TARGET.enddate = SOURCE_end.enddate
-- Delete the remaining items
WHEN NOT MATCHED BY SOURCE THEN
    DELETE;

Result:

resNo   subres   startdate          enddate
    1        2   2012-01-02 22:03   2012-01-04 16:23
    1        2   2012-01-14 10:09   2012-01-16 03:00
    1        3   2012-01-06 16:23   2012-01-06 22:23
    2        2   2012-01-04 05:23   2012-01-06 16:23

Edit: If there is any risk of concurrent edits on the target table, you might want to add the HOLDLOCK hint. This will prevent any primary key violation errors, and be slighty more resource effective. (Thanks Joey):

MERGE INTO t_resourcetable WITH (HOLDLOCK) AS TARGET
...

For SQL Server 2005 you could do something like this:

create table #temp
(
  resNo int,
  subres int,
  enddate datetime,
  primary key (resNo, subres)
)

-- Store the values you need for enddate in a temp table
insert into #temp
select resNo, 
       subres,
       max(enddate) as enddate
from t_resourcetable
group by resNo, subres

-- Delete duplicates keeping the row with min startdate
delete T
from (
        select row_number() over(partition by resNo, subres order by startdate) as rn
        from t_resourcetable
     ) as T
where rn > 1

-- Set enddate where needed
update T set enddate = tmp.enddate
from t_resourcetable as T
  inner join #temp as tmp
    on T.resNo = tmp.resNo and
       t.subres = tmp.subres
where T.enddate <> tmp.enddate

drop table #temp

You could first store the result in a temporary table like this:

DECLARE @tmp TABLE
(
    resNo INT, 
    subres INT, 
    startdate DATETIME, 
    enddate DATETIME
)

INSERT   @tmp
SELECT   resNo, subres, MIN(startdate), MAX(enddate)
FROM     t_resourcetable
GROUP BY resNo, subres

To update t_resourcetable table you could do this:

DELETE   t_resourcetable

INSERT   t_resourcetable
SELECT   * 
FROM     @tmp

And run all of this in a transaction.

I would create a temp table. Now you can fill the temp table with the new and cleaned data. I think, you must make a combined key with resNo and subres and select min startdate and max enddate.

At least, delete all data in the old table and fill it with data from temp table.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM