简体   繁体   中英

SQL Server: Strange execution plan

I have some ETL code running on SQL Server 2008 Standard. A relatively small number of rows (~50,000) are processed and loaded into a temp table. I then execute an insert query to copy over the rows which are not present in the much larger table (~1,000,000+ rows). The temp table contains the same primary key and clustered index as the destination table.

create table #NewClaims(ExtractDate datetime, SiteName nvarchar(50), SiteCd nvarchar(50), ContractTypeCd nvarchar(50), 
ClaimRateType nvarchar(50), ClaimRateTypeCd nvarchar(50), ClaimStatus nvarchar(50), ClaimStatusCd nvarchar(50), 
CreationDt datetime, StatusDt datetime, ClaimID nvarchar(50), SeqNum int, CreationUserID nvarchar(50), 
SpecialClaimInd nvarchar(50), JobSeekerID nvarchar(50), InvoiceNum nvarchar(50), JobID nvarchar(50), JobRefId int,
RecoveryReason nvarchar(50), ClaimAmount money, GSTAmount money, ApprovedAmount money, ClaimCurrencyInd nvarchar(50), 
EmployerID nvarchar(50), BaseRateType nvarchar(50), BaseRateTypeCd nvarchar(50)
constraint PK_NewClaims primary key clustered(ClaimID, ClaimStatusCD));

Here is the SQL that loads the temp table records into the target

insert into dbo.Claim(
ExtractDate, SiteName, SiteCd, ContractTypeCd, ClaimRateType, ClaimRateTypeCd, ClaimStatus, ClaimStatusCd, CreationDt, 
StatusDt, ClaimID, SeqNum, CreationUserID, SpecialClaimInd, JobSeekerID, InvoiceNum, JobID, JobSeqNum, RecoveryReason, 
ClaimAmount, GSTAmount, ApprovedAmount, ClaimCurrencyInd, EmployerID, BaseRateType, BaseRateTypeCd
)
select 
n.ExtractDate,  n.SiteName, n.SiteCd, n.ContractTypeCd, n.ClaimRateType, n.ClaimRateTypeCd, n.ClaimStatus, n.ClaimStatusCd, 
n.CreationDt, n.StatusDt, n.ClaimID, n.SeqNum, n.CreationUserID, n.SpecialClaimInd, n.JobSeekerID, n.InvoiceNum, n.JobID, 
n.JobRefId, n.RecoveryReason, n.ClaimAmount, n.GSTAmount, n.ApprovedAmount, n.ClaimCurrencyInd, n.EmployerID, n.BaseRateType, 
n.BaseRateTypeCd
from #NewClaims as n
left join dbo.Claim as c on n.ClaimID = c.ClaimID and n.ClaimStatusCd = c.ClaimStatusCd
where c.ClaimID is null

When I run this the execution plan does something unusual. It refuses to utilise the PK clustered index on the dest table and attempts to use any other index available instead. Curiously it uses this index to retreieve the ClaimID and StatusCd. If I disable the indexes on the dest table one by one, the execution plan will keep trying to use other indexes until I've disabled all of them except the clustered, at which point it finally gives in and uses it, but produces a bunch of bitmap operations. The query runs faster when this occurs.

I've also experimented with adding an index hint: with (index(1)). This hint causes it to work as I expect, using the index and running much faster than the non hint version. The forced index seek shows scalar operators in the execution plan - does this indicate a problem?

Seek Keys[1]: Prefix: [ESD4].[dbo].[Claim].ClaimID, [ESD4].[dbo].[Claim].ClaimStatusCd = Scalar Operator([tempdb].[dbo].[#NewClaims].[ClaimID] as [n].[ClaimID]), Scalar Operator([tempdb].[dbo].[#NewClaims].[ClaimStatusCd] as [n].[ClaimStatusCd])

Is there something I'm missing? I don't like having to force SQL to use a specific execution plan, as index hints can often backfire.

Update

  • Recomputing the table statistics didn't help
  • Changing from left join to "where not exists" doesn't affect the choice of index used, but it did change the plan slightly. Instead of the hash match (Left Join), it's now doing Hash Match (Left Anti Semi Join), which is presumably faster.

It's difficult to say without more information, but you can test this. Sometimes it may be faster to get an output of the PK IDs that you want to insert into a table variable (@PKIDs), then in the update statement, use 'select ... from #newClaims inner join @PKIDs'. It's difficult to say without the plan and having the dataset, but maybe in your situation it could help. If you inner join, you can use an index hint for the PK. The #NewClaims table may also only need a PK on ClaimID and ClaimStatusCd, reducing the time to build #NewClaims.

declare @PKIDs table (ClaimID INT PRIMARY KEY)

insert into @PKIDs (ClaimID)
select distinct n.ClaimID
from #NewClaims as n
where not exists (select 1 from dbo.Claim as c where c.ClaimID = n.ClaimID and n.ClaimStatusCd = c.ClaimStatusCd)

insert into dbo.Claim(
ExtractDate, SiteName, SiteCd, ContractTypeCd, ClaimRateType, ClaimRateTypeCd, ClaimStatus, ClaimStatusCd, CreationDt, 
StatusDt, ClaimID, SeqNum, CreationUserID, SpecialClaimInd, JobSeekerID, InvoiceNum, JobID, JobSeqNum, RecoveryReason, 
ClaimAmount, GSTAmount, ApprovedAmount, ClaimCurrencyInd, EmployerID, BaseRateType, BaseRateTypeCd
)
select 
n.ExtractDate,  n.SiteName, n.SiteCd, n.ContractTypeCd, n.ClaimRateType, n.ClaimRateTypeCd, n.ClaimStatus, n.ClaimStatusCd, 
n.CreationDt, n.StatusDt, n.ClaimID, n.SeqNum, n.CreationUserID, n.SpecialClaimInd, n.JobSeekerID, n.InvoiceNum, n.JobID, 
n.JobRefId, n.RecoveryReason, n.ClaimAmount, n.GSTAmount, n.ApprovedAmount, n.ClaimCurrencyInd, n.EmployerID, n.BaseRateType, 
n.BaseRateTypeCd
from #NewClaims as n
join @PKIDs as pkids on pkids.ClaimID = n.ClaimID

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM