简体   繁体   中英

SQL Server Update query very slow

I ran the following query on a previous years data and it took 3 hours, this year it took 13 days. I don't know why this is though. Any help would be much appreciated.

I have just tested the queries in the old SQL server and it works in 3 hours. Therefore the problem must have something to do with the new SQL server I created. Do you have any ideas what the problem might be?

The query:

USE [ABCJan]
CREATE INDEX Link_Oct ON ABCJan2014 (Link_ref)
GO
CREATE INDEX Day_Oct ON ABCJan2014 (date_1)
GO

UPDATE   ABCJan2014
SET      ABCJan2014.link_id = LT.link_id
FROM     ABCJan2014 MT
INNER JOIN  [Central].[dbo].[LookUp_ABC_20142015] LT
ON MT.Link_ref = LT.Link_ref

UPDATE   ABCJan2014
SET      SumAvJT  = ABCJan2014.av_jt * ABCJan2014.n

UPDATE   ABCJan2014
SET      ABCJan2014.DayType = LT2.DayType
FROM     ABCJan2014 MT
INNER JOIN  [Central].[dbo].[ABC_20142015_days] LT2
ON  MT.date_1 = LT2.date1

With the following data structures:

ABCJan2014 (70 million rows - NO UNIQUE IDENTIFIER - Link_ref & date_1 together are unique)

Link_ID nvarchar (17)
Link_ref    int
Date_1  smalldatetime
N       int
Av_jt       int
SumAvJT decimal(38,14)
DayType nvarchar (50)

LookUp_ABC_20142015

Link_ID nvarchar (17) PRIMARY KEY
Link_ref    int INDEXED
Link_metres int

ABC_20142015_days

Date1   smalldatetime   PRIMARY KEY & INDEXED
DayType nvarchar(50)

EXECUTION PLAN在此处输入图像描述

It appears to be this part of the query that is taking such a long time.

Thanks again for any help, I'm pulling my hair out.

Create Index on ABCJan2014 table as it is currently a heap

If you look at the execution plan the time is in the actual update

Look at the log file
Is the log file on a fast disk?
Is the log file on the same physical disk?
Is the log file required to grow?
Size the log file to like 1/2 the size of the data file

As far as indexes test and tune this
If the join columns are indexed not much to do here

select   count(*) 
FROM     ABCJan2014 MT
INNER JOIN  [Central].[dbo].[LookUp_ABC_20142015] LT
ON MT.Link_ref = LT.Link_ref

select   count(*) 
FROM     ABCJan2014 MT
INNER JOIN  [Central].[dbo].[ABC_20142015_days] LT2
ON  MT.date_1 = LT2.date1

Start with a top (1000) to get update tuning working
For grins please give this a try
Please post this query plan
(do NOT add an index to ABCJan2014 link_id)

UPDATE   top (1000) ABCJan2014
SET      MT.link_id = LT.link_id
FROM     ABCJan2014 MT
JOIN     [Central].[dbo].[LookUp_ABC_20142015] LT
          ON MT.Link_ref = LT.Link_ref 
         AND MT.link_id <> LT.link_id

If LookUp_ABC_20142015 is not active then add a nolock

JOIN     [Central].[dbo].[LookUp_ABC_20142015] LT with (nolock)

nvarchar (17) for a PK to me is just strange
why n - do you really have some unicode?
why not just char(17) and let it allocate space?

Why have 3 update statements when you can do it in one?

UPDATE   MT
SET      MT.link_id = CASE WHEN LT.link_id IS NULL THEN MT.link_id ELSE LT.link_id END,
         MT.SumAvJT  = MT.av_jt * MT.n,
         MT.DayType = CASE WHEN LT2.DayType IS NULL THEN MT.DayType ELSE LT2.DayType END
FROM     ABCJan2014 MT
LEFT OUTER JOIN  [Central].[dbo].[LookUp_ABC_20142015] LT
    ON MT.Link_ref = LT.Link_ref
LEFT OUTER JOIN  [Central].[dbo].[ABC_20142015_days] LT2
    ON MT.date_1 = LT2.date1

Also, I would create only one index for the join. Create the following index after the updates.

CREATE INDEX Day_Oct ON ABCJan2014 (date_1)
GO

Before you run, compare the execution plan by putting the update query above and your 3 update statements altogether in one query window, and do Display Estimated Execution Plan. It will show the estimated percentages and you'll be able to tell if it's any better (if new one is < 50%).

Also, it looks like the query is slow because it's doing a Hash Match. Please add a PK index on [LookUp_ABC_20142015].Link_ref.

[LookUp_ABC_20142015].Link_ID is a bad choice for PK, so drop the PK on that column.

Then add an index to [ABCJan2014].Link_ref.

See if that makes any improvement.

If you are going to update a table you need a unique identifier, so put on on ABCJan2014 ASAP especially since it is so large. There is no reason why you can't create a unique index on the fields that together compose the unique record. In the future, do not ever design a table that does not have a unique index or PK. This is simply asking for trouble both in processing time and more importantly in data integrity.

When you have a lot of updating to do to a large table, it is sometimes more effective to work in batches. You don't tie up the table in a lock for a long period of time and sometimes it is even faster due to how the database internals are working the problem. Consider processing 50,000 K records at a time (you may need to experiment to find the sweet spot of records to process in a batch, there is generally a point where the update starts to take significantly longer) in a loop or cursor.

UPDATE ABCJan2014
SET ABCJan2014.link_id = LT.link_id
FROM ABCJan2014 MT
JOIN [Central].[dbo].[LookUp_ABC_20142015] LT ON MT.Link_ref = LT.Link_ref

The code above will update all records from the join. If some of the records already have the link_id you might save considerable time by only updating the records where link_id is null or ABCJan2014.link_id <> LT.link_id. You have a 70 million record table, you don't need to be updating records that do not need a change. The same thing of course applies to your other updates as well.

Not knowing how much data gets added to this table or how often this number need updating, consider that this SumAvJT might be best defined as a persisted calculated field. Then it gets updated automatically when one of the two values changes. This wouldn't help if the table is bulk loaded but might if records come in individually.

In the execution plan, it makes recommendations for indexes being added. Have you created those indexes? Also, take a look at your older server's data structure - script out the table structures including indexes - and see if there are differences between them. At some point somebody's possibly built an index on your old server's tables to make this more efficient.

That said, what volume of data are you looking at? If you're looking at significantly different volumes of data, it could be that the execution plans generated by the servers differ significantly. SQL Server doesn't always guess right, when it builds the plans.

Also, are you using prepared statements (ie, stored procedures)? If you are, then it's possible that the cached data access plan is simply out of date & needs to be updated, or you need to update statistics on the tables and then run the procedure with recompile so that a new data access plan is generated.

where is located the [Central] server? It is possible to duplicate your [Central].[dbo].[LookUp_ABC_20142015] and [Central].[dbo].[ABC_20142015_days] table locally?

1) Do:

  select * into [ABC_20142015_days] from [Central].[dbo].[ABC_20142015_days]
  select * into [LookUp_ABC_20142015] from [Central].[dbo].[LookUp_ABC_20142015]  

2) Recreate the index on [ABC_20142015_days] and [LookUp_ABC_20142015]...

3) Rewrite your updates by removing the "[Central].[dbo]." prefix !

Just after writing this solution, I found an other solution, but I'm not sure if it's applicable to your server: add the "REMOTE" join hints... I never use it, but you can found the documentation at https://msdn.microsoft.com/en-us/library/ms173815.aspx

Hopping it could help you...

All the previous answers that suggest improving the structure of the tables and the queries itself are nice to know for you, there is doubt about that.

However your question is why the SAME data/structure and the SAME queries give this huge difference.

So before you look at optimising sql you must find the real cause. And the real cause is hardware or software or configuration. Start by compating sql server with the old one then move to the hardware and benchmark it. Lastly look at the software for differences.

Only when you solved the actual problem you can start improving the sql itself

ALTER TABLE dbo.ABCJan2014
    ADD SumAvJT AS av_jt * n --PERSISTED

CREATE INDEX ix ON ABCJan2014 (Link_ref) INCLUDE (link_id)
GO
CREATE INDEX ix ON ABCJan2014 (date_1) INCLUDE (DayType)
GO

UPDATE ABCJan2014
SET ABCJan2014.link_id = LT.link_id
FROM ABCJan2014 MT
JOIN [Central].[dbo].[LookUp_ABC_20142015] LT ON MT.Link_ref = LT.Link_ref

UPDATE ABCJan2014
SET ABCJan2014.DayType = LT2.DayType
FROM ABCJan2014 MT
JOIN [Central].[dbo].[ABC_20142015_days] LT2 ON MT.date_1 = LT2.date1

I guess there is a lot of page splitting. Can You try this?

SELECT

(SELECT LT.link_id FROM [Central].[dbo].[LookUp_ABC_20142015] LT 
WHERE MT.Link_ref = LT.Link_ref) AS Link_ID,
Link_ref,
Date_1,
N,
Av_jt,
MT.av_jt * MT.n AS SumAvJT,
(SELECT LT2.DayType FROM [Central].[dbo].[ABC_20142015_days] LT2 
WHERE MT.date_1 = LT2.date1) AS DayType

INTO ABCJan2014new
FROM ABCJan2014 MT

In addition to all answer above.

i) Even 3 hour is lot.I mean even if any query take 3 hours,I first check my requirement and revise it.Raise the issue.Of course I will optimize my query. Like in your query,none of the update appear to be serious matter.

Like @Devart pointed,one of the column can be calculated columns.

ii) Trying running other query in new server and compare.?

iii) Rebuild the index.

iv) Use "with (nolock)" in your join.

v) Create index on table LookUp_ABC_20142015 column Link_ref.

vi)clustered index on nvarchar (17) or datetime is always a bad idea. join on datetime column or varchar column always take time.

Try with alias instead of recapturing table name in UPDATE query

USE [ABCJan]
CREATE INDEX Link_Oct ON ABCJan2014 (Link_ref)
GO
CREATE INDEX Day_Oct ON ABCJan2014 (date_1)
GO

UPDATE   MT
SET      MT.link_id = LT.link_id
FROM     ABCJan2014 MT
INNER JOIN  [Central].[dbo].[LookUp_ABC_20142015] LT
ON MT.Link_ref = LT.Link_ref

UPDATE   ABCJan2014
SET      SumAvJT  = av_jt * n

UPDATE   MT
SET      MT.DayType = LT2.DayType
FROM     ABCJan2014 MT
INNER JOIN  [Central].[dbo].[ABC_20142015_days] LT2
ON  MT.date_1 = LT2.date1

Frankly, I think you've already answered your own question.

ABCJan2014 (70 million rows - NO UNIQUE IDENTIFIER - Link_ref & date_1 together are unique)

If you know the combination is unique, then by all means 'enforce' it. That way the server will know it too and can make use of it.

Query Plan showing the need for an index on [ABCJAN2014].[date_1] 3 times in a row!

You shouldn't believe everything that MSSQL tells you, but you should at least give it a try =)

Combining both I'd suggest you add a PK to the table on the fields [date_1] and [Link_ref] (in that order.): Mind. adding a Primary Key -- which is essentially a clustered unique index -- will take a while and require a lot of space as the table pretty much gets duplicated along the way.

As far as your query goes, you could put all 3 updates in 1 statement (similar to what joordan831 suggests) but you should take care about the fact that a JOIN might limit the number of rows affected. As such I'd rewrite it like this:

UPDATE ABCJan2014
SET    ABCJan2014.link_id = (CASE WHEN LT.Link_ref IS NULL THEN ABCJan2014.link_id ELSE LT.link_id END), -- update when there is a match, otherwise re-use existig value
       ABCJan2014.DayType = (CASE WHEN LT2.date1   IS NULL THEN ABCJan2014.DayType ELSE LT2.DayType END), -- update when there is a match, otherwise re-use existig value
       SumAvJT            = ABCJan2014.av_jt * ABCJan2014.n

FROM     ABCJan2014 MT
LEFT OUTER JOIN  [Central].[dbo].[LookUp_ABC_20142015] LT
             ON MT.Link_ref = LT.Link_ref

LEFT OUTER JOIN [Central].[dbo].[ABC_20142015_days] LT2
             ON MT.date_1 = LT2.date1

which should have the same effect as running your original 3 updates sequentially; but hopefully taking a lot less time.

PS: Going by the Query Plans, you already have indexes on the tables you JOIN to ([LookUp_ABC_20142015] & [LookUp_ABC_20142015]) but they seem to be non-unique (and not always clustered). Assuming they're suffering from the 'we know it's unique but the server doesn't'-illness: it would be advisable to also add a Primary Key to those tables on the fields you join to, both for data-integrity and performance reasons!

Good luck.

Update data
set
data.abcKey=surrogate.abcKey
from [MyData].[dbo].[fAAA_Stage] data with(nolock)
join [MyData].[dbo].[dBBB_Surrogate] surrogate with(nolock)
on data.MyKeyID=surrogate.MyKeyID

The surrogate table must have a nonclustered index with an unique key. myKeyID must be created as an unique non-clustered key. The performance results improvements are significant.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM