简体   繁体   English

使用Sql Server 2008进行子查询缓存

[英]Subquery Caching with Sql Server 2008

I am creating a stored procedure with Sql Server 2008 which will return 2 result sets. 我正在使用Sql Server 2008创建存储过程,该存储过程将返回2个结果集。 The first query returns a result set that I would like to resuse as in the second query as a subquery (see example below). 与第二个查询一样,第一个查询返回一个我想重用的结果集作为子查询(请参见下面的示例)。 However, since the first query and the subquery essentially return the same data, I was wondering if there is some caching meachanism that I can use. 但是,由于第一个查询和子查询本质上返回相同的数据,所以我想知道是否可以使用某些缓存机制。 Is it possible to do that? 有可能这样做吗? I am trying to optimize for performance. 我正在尝试优化性能。

SELECT * 
FROM   Employees
WHERE  BossId = 1

SELECT * 
FROM   CostCenters
WHERE  EmployeeId IN (
    SELECT EmployeeId 
    FROM   Employees
    WHERE  BossId = 1
)

PS The example is a simplified problem. PS该示例是一个简化的问题。

You can cache CTEs by reusing the query plan. 您可以通过重用查询计划来缓存CTE。 This requires injecting the Eager Spool between the resultset produced by the function. 这需要在该函数产生的结果集之间注入Eager Spool Quassnoi makes use of it in this article , but I can't find a better example at this time. Quassnoi在本文中使用了它 ,但目前无法找到更好的示例。 Here's another good read on Eager Spool . 这是有关Eager Spool的另一本好书

As far as I know you would need to use either a temp table or table variable for this. 据我所知,您将需要为此使用临时表或表变量。 A comparison of the two is here. 是两者比较

The below uses the OUTPUT clause to fill the table variable and select from it in one statement. 下面使用OUTPUT子句填充表变量,并在一条语句中从中选择。

declare @MatchingResults table
(
EmployeeId int primary key --Other Columns
)

INSERT INTO @MatchingResults
OUTPUT INSERTED.*
SELECT EmployeeId  --Other Columns
FROM   Employees
WHERE  BossId = 1


SELECT * 
FROM   CostCenters
WHERE  EmployeeId IN (
    SELECT EmployeeId 
    @MatchingResults))

Table Variables are your best option. 表变量是最佳选择。 You can also improve performance by using the exists operator for the subquery rather than in : 您还可以通过对子查询使用exists运算符来提高性能,而不是in

-- obviously the columns should match your Employees table
declare @results table (
    employeeId int,
    column1 varchar,
    column2 int
)

insert into @results
select * from Employees
where BossId = 1

-- using exists/not exists performs much better than in
select * from CostCenters
where exists ( select 0
               from @results as r
               where CostCenters.employeeId = r.employeeId )

Caching the data of the first query will probably NOT result in better performance. 缓存第一个查询的数据可能不会导致更好的性能。 When SQL Server receives the query it breaks it down to simple steps, chooses the proper indexes and operators and retrieves the data using those indexes. SQL Server收到查询后,将其分解为简单的步骤,选择适当的索引和运算符,然后使用这些索引检索数据。 By storing the first query's data in a table variable or temporary table you are preventing SQL Server from using any indexes on the Employees table. 通过将第一个查询的数据存储在表变量或临时表中,可以防止SQL Server使用Employees表上的任何索引。

If you rewrite your query to its equivalent using JOIN it's easier to see what happens 如果使用JOIN将查询重写为等效查询,则更容易了解会发生什么情况

SELECT c.* 
FROM   CostCenters c INNER JOIN Employees e on c.EmployeeId=e.EmployeeId
WHERE e.BossId=1

When SQL Server sees this query it will check the statistics of the tables. 当SQL Server看到此查询时,它将检查表的统计信息。 If BossId is a highly selective indexed column it may first try to filter by this. 如果BossId是高度选择性索引的列,则它可能首先尝试以此过滤。 Otherwise it will use any indexes on the EmployeeId columns to limit rows from both tables to a minimum, then BossId to find the proper rows and return them. 否则,它将使用EmployeeId列上的任何索引将两个表中的行限制为最小值,然后使用BossId查找正确的行并返回它们。

Filtering operations on indexes are quite fast as the indexes contain only a subset of the row data, are easier to cache in memory and have a physical structure that allows quick searching. 索引的筛选操作非常快,因为索引仅包含行数据的一个子集,更易于缓存在内存中,并且具有允许快速搜索的物理结构。

You really shouldn't try to second-guess SQL Server's query optimizer before you encounter an actual performance problem. 在遇到实际的性能问题之前,您真的不应该尝试猜测SQL Server的查询优化器。 Most of the time you will prevent it from selecting the best execution plan and result in worse performance 大多数时候,您会阻止它选择最佳的执行计划,从而导致性能下降

The best solution I can think of is to go with CTE 我能想到的最好的解决方案是选择CTE

http://msdn.microsoft.com/en-us/library/ms190766.aspx http://msdn.microsoft.com/en-us/library/ms190766.aspx

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM