简体   繁体   中英

Subquery Caching with Sql Server 2008

I am creating a stored procedure with Sql Server 2008 which will return 2 result sets. The first query returns a result set that I would like to resuse as in the second query as a subquery (see example below). However, since the first query and the subquery essentially return the same data, I was wondering if there is some caching meachanism that I can use. Is it possible to do that? I am trying to optimize for performance.

SELECT * 
FROM   Employees
WHERE  BossId = 1

SELECT * 
FROM   CostCenters
WHERE  EmployeeId IN (
    SELECT EmployeeId 
    FROM   Employees
    WHERE  BossId = 1
)

PS The example is a simplified problem.

You can cache CTEs by reusing the query plan. This requires injecting the Eager Spool between the resultset produced by the function. Quassnoi makes use of it in this article , but I can't find a better example at this time. Here's another good read on Eager Spool .

As far as I know you would need to use either a temp table or table variable for this. A comparison of the two is here.

The below uses the OUTPUT clause to fill the table variable and select from it in one statement.

declare @MatchingResults table
(
EmployeeId int primary key --Other Columns
)

INSERT INTO @MatchingResults
OUTPUT INSERTED.*
SELECT EmployeeId  --Other Columns
FROM   Employees
WHERE  BossId = 1


SELECT * 
FROM   CostCenters
WHERE  EmployeeId IN (
    SELECT EmployeeId 
    @MatchingResults))

Table Variables are your best option. You can also improve performance by using the exists operator for the subquery rather than in :

-- obviously the columns should match your Employees table
declare @results table (
    employeeId int,
    column1 varchar,
    column2 int
)

insert into @results
select * from Employees
where BossId = 1

-- using exists/not exists performs much better than in
select * from CostCenters
where exists ( select 0
               from @results as r
               where CostCenters.employeeId = r.employeeId )

Caching the data of the first query will probably NOT result in better performance. When SQL Server receives the query it breaks it down to simple steps, chooses the proper indexes and operators and retrieves the data using those indexes. By storing the first query's data in a table variable or temporary table you are preventing SQL Server from using any indexes on the Employees table.

If you rewrite your query to its equivalent using JOIN it's easier to see what happens

SELECT c.* 
FROM   CostCenters c INNER JOIN Employees e on c.EmployeeId=e.EmployeeId
WHERE e.BossId=1

When SQL Server sees this query it will check the statistics of the tables. If BossId is a highly selective indexed column it may first try to filter by this. Otherwise it will use any indexes on the EmployeeId columns to limit rows from both tables to a minimum, then BossId to find the proper rows and return them.

Filtering operations on indexes are quite fast as the indexes contain only a subset of the row data, are easier to cache in memory and have a physical structure that allows quick searching.

You really shouldn't try to second-guess SQL Server's query optimizer before you encounter an actual performance problem. Most of the time you will prevent it from selecting the best execution plan and result in worse performance

The best solution I can think of is to go with CTE

http://msdn.microsoft.com/en-us/library/ms190766.aspx

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM