How to avoid table scan on SQL Server in this situation

Question

There are two tables, Costs and Logs . The data in Costs table can be in the millions of rows, and in Logs table it can be billions of rows.

I need to update the CostBy column in Costs table in a service task in production environment within 100 records each run.

CREATE TABLE Cost
(
    C_PK uniqueidentifier primary key not null,
    C_CostBy varchar(3) not null
)

CREATE TABLE Logs
(
    L_PK uniqueidentifier primary key not null,
    L_ParentTable varchar(255) not null,  -- Table Cost and other table's name
    L_ParentID uniqueidentifier not null, -- Cost's pk and other table's pk
    L_Event varchar(3) not null, -- Part are 'ADD' and other event types
    L_User varchar(3) not null 
 )

CREATE NONCLUSTERED INDEX [L_ParentID] 
    ON [dbo].[Costs] ([L_ParentID] ASC)

Here is the original update statement:

UPDATE TOP(100) Costs
SET CostBy = ISNULL(L_User, '~UK')
FROM Costs
LEFT JOIN Logs ON L_ParentID = C_PK AND L_Event = 'ADD'
WHERE CostBy = ''

However, the statement introducing a massive performance issue, high cost of table scan in Costs table.

My question is how to avoid the table scan in Costs table or how to optimize the update statement?

Thanks in advance.

Answer 1

You may want to try the following.

First, create an index on Logs, including all the relevant columns:

CREATE INDEX ix ON Logs 
(
  L_Parent_ID -- join condition, variable
)
INCLUDE 
(
  L_User -- no filter condition, but you use it your update
)
WHERE 
(
  L_Event = 'ADD' -- join condition, constant
)

If this is a unique index, ie. only a single row will ever exist with ADD event for a given parent ID, make sure to make this a unique index as it can dramatically improve performance.

Second, and this is a hit and miss situation, you may try with an index on Costs (CostBy) because you're only looking for empty CostBy values to update. This index will need to be updated upon your query because it's updating it, so it may slow down your query instead of speeding it up. It depends on a number of factors.

If you have an enterprise license, try both with with WITH (DATA_COMPRESSION = PAGE) , it can significantly improve IO time at the expense of CPU. It depends which is your bottleneck.

Additionally, depending on the nature of your data, updating statistics may improve your queries. If there is a disproportionate number of rows with CostBy = '' to other values in there, you may benefit from full statistics on that field. Consider NORECOMPUTE if you only need them for this specific query, this one time.

CREATE STATISTICS st_Costs_CostBy
ON Costs (CostBy)  
WITH FULLSCAN, NORECOMPUTE;

How to avoid table scan on SQL Server in this situation

Question

1 answers

solution1
0 2020-03-27 10:48:51

How to avoid table scan on SQL Server in this situation

Question

1 answers

solution1 0 2020-03-27 10:48:51

solution1
0 2020-03-27 10:48:51