How can I optimize this Update join query in SQL Server 2014?

Question

I am using an update join query to update some records. I am actually joining an indexed table to itself, and updating where a pattern is met.

This query worked fine for about a million records, but with 14 million records it just doesn't scale. The reason I am doing it this way is because the only other option I was aware of was to use a cursor, which would have been atrocious.

Right now the query is taking more than 12 hours to run. Any help to find a better way to do this would be GREATLY appreciated. I am using SQL Server Management Studio. For the query below, here is how the index was created in the AIS_Positions table:

CREATE INDEX SID ON AIS_Positions (Id)

UPDATE R1 
SET
    BOUNDARY = 'BERTH',
    TRAVEL_MODE = 'HOTEL',
    BerthStartFlag = 'YES',
    BerthStartTime = R1.IntervalStart,
    BerthEndTime = R2.IntervalEnd,
    BerthStart_ID = R1.Id,
    BerthEnd_ID = R2.Id
FROM 
    AIS_Positions R1
INNER JOIN 
    AIS_Positions R2 ON R1.MMSI = R2.MMSI
                     AND R1.ID < R2.ID
                     AND R1.IntervalSpeed <= 0.1
                     AND R2.IntervalSpeed <= 0.1
                     AND DATEDIFF(HOUR, R1.POSITIONTIME, R2.POSITIONTIME) BETWEEN 1 AND 72
                     AND (SELECT TOP 1 IntervalSpeed 
                          FROM AIS_Positions 
                          WHERE MMSI = R1.MMSI AND ID = R1.ID-1) > 0.1
                     AND (SELECT TOP 1 IntervalSpeed 
                          FROM AIS_Positions 
                          WHERE MMSI = R1.MMSI AND ID = R2.ID+1) > 0.1
                     AND (SELECT TOP 1 Boundary 
                          FROM AIS_Positions 
                          WHERE MMSI = R1.MMSI AND ID = R1.ID-1) IS NULL

Answer 1

This might be a good start:

/*
 create nonclustered index [ix_ais_positions_mmsi_inc] on ais_positions 
   (mmsi) 
   include (id, intervalspeed, boundary, PositionTime, IntervalStart, IntervalEnd);
*/


update R1 set
    boundary = 'berth',
    travel_mode = 'hotel',
    BerthStartFlag = 'yes',
    BerthStartTime = R1.IntervalStart,
    BerthEndTime = R2.IntervalEnd,
    BerthStart_id = R1.Id,
    BerthEnd_id = R2.Id
from ais_positions R1

inner join ais_positions R2

    on R1.mmsi = R2.mmsi

    and R1.id < R2.id
    --How many matches does R1.id < R2.id yield? Is this updating the same row more than once?

    and R1.IntervalSpeed <= 0.1

    and R2.IntervalSpeed <= 0.1
    --and datediff(hour, R1.positiontime, R2.positiontime) between 1 and 72
    and datediff(hour, R1.positiontime, R2.positiontime) >= 1 and datediff(hour, R1.positiontime, R2.positiontime) <= 72

    --and (select top 1 IntervalSpeed from ais_positions where mmsi = R1.mmsi and id = R1.id-1) > 0.1
    and exists (select 1 from ais_positions i where i.mmsi = R1.mmsi and i.id = R1.id-1 and i.IntervalSpeed > 0.1 and i.Boundary is null)

    --and (select top 1 IntervalSpeed from ais_positions where mmsi = R1.mmsi and id = R2.id+1) > 0.1
    and exists (select 1 from ais_positions where mmsi = R1.mmsi and id = R2.id+1 and IntervalSpeed > 0.1)
    --and (select top 1 Boundary from ais_positions where mmsi = R1.mmsi and id=R1.id-1) is null

Answer 2

Have you considered using temporary tables for the conditions of your subqueries? Your query may be running the subqueries for each line of the query above them. Maybe something like this:

SELECT A1.ID, A1.IntervalSpeed as topint1
INTO #Int_tabl_1
FROM AIS_Positions as A1
INNER JOIN AIS_Positions as A2
ON A1.MMSI = A2.MMSI AND A1.ID = A2.ID -1

SELECT A1.ID, A1.IntervalSpeed as topint2
INTO #Int_tabl_2
FROM AIS_Positions as A1
INNER JOIN AIS_Positions as A2
ON A1.MMSI = A2.MMSI AND A1.ID = A2.ID+1

SELECT A1.ID, A1.Boundary
INTO #Bound_tbl
FROM AIS_Positions as A1
INNER JOIN AIS_Positions as A2
ON A1.MMSI = A2.MMSI AND A1.ID = A2.ID-1

Then test against

topint1 > 0.1 , topint2 > 0.1 , and Boundary is null

How can I optimize this Update join query in SQL Server 2014?

Question

2 answers

solution1
1 2016-09-02 15:15:48

solution2
1 2016-09-02 15:32:31

How can I optimize this Update join query in SQL Server 2014?

Question

2 answers

solution1 1 2016-09-02 15:15:48

solution2 1 2016-09-02 15:32:31

solution1
1 2016-09-02 15:15:48

solution2
1 2016-09-02 15:32:31