Including column improves query performance SQL Server 2008

Question

A query performance is being affected if a column is included or not, but the weird thing is that it affects positive (reduce time execution) if the column is included.

The query includes a few joins to a view, some tables and tabled valued functions like the next:

SELECT 
    v1.field1, t2.field2
FROM 
    view v1 WITH (nolock)
INNER JOIN 
    table t1 WITH (nolock) ON v.field1 = t1.field1
INNER JOIN 
    table2 t2 WITH (nolock) ON t2.field2 = t1.field2
INNER JOIN 
    function1(@param) f1 ON f1.field3 = t2.field3
WHERE 
    (v.date1 = @param OR v.date2 = @param)

The thing is if I include within the select a varchar(200) not null column which is part of the view (it is not indexed in the original table or the view, and it's not part of a constraint), the query performance is X seconds, but if I don't include it then the performance ups to 4X seconds, which is a lot of difference just for including a column; so the query with the best performance will be like:

SELECT 
    v1.field1, t2.field2, v1.fieldWhichAffectsPerformance
    view v1 WITH (nolock)
INNER JOIN 
    table t1 WITH (nolock) ON v.field1 = t1.field1
INNER JOIN 
    table2 t2 WITH (nolock) ON t2.field2 = t1.field2
INNER JOIN 
    function1(@param) f1 ON f1.field3 = t2.field3
WHERE 
    (v.date1 = @param OR v.date2 = @param)

It's mandatory to remove the column which improves the query performance, but without affecting in a negative way the actual performance. Any ideas?

EDIT: as suggested i've reviewed the execution plan, and the query without the column runs an extra hash match (left outer join) and uses index scan which cost a lot of CPU instead index seek which are the plan in the query with the column included. how can I remove the column without affecting the performance? any ideas?

Answer 1

Optimizers are complicated. Without query plans, there is only speculation. You need to look at the query plans to get a real answer.

One possibility is the order of processing. The select could equivalently be written as:

SELECT t1.field1, t2.field2

because the on condition specifies that columns in the two tables are the same. The optimizer my recognize that the or prevents the use of indexes on the view (which is probably not applicable anyway). So, instead of scanning the view, it decides to scan table1 and then bring in the view.

By including an additional column in the select , you are pushing the optimizer to scan the view -- and this might be the better execution plan.

This is all hypothetical, but it gives a mechanism on how your observed timings could happen.

Including column improves query performance SQL Server 2008

Question

1 answers

solution1
0 2017-12-28 16:29:48

Including column improves query performance SQL Server 2008

Question

1 answers

solution1 0 2017-12-28 16:29:48

solution1
0 2017-12-28 16:29:48