SQL Server从大表中选择慢

Question

I have a table with about 20+ million records. 我有一张约有2000多万条记录的表格。

Structure is like: 结构如下：

EventId UNIQUEIDENTIFIER
SourceUserId UNIQUEIDENTIFIER
DestinationUserId UNIQUEIDENTIFIER
CreatedAt DATETIME
TypeId INT
MetaId INT

Table is receiving about 100k+ records each day. 表每天接收大约10万条记录。

I have indexes on each column except MetaId, as it is not used in 'where' clauses 我在除MetaId之外的每一列都有索引，因为它没有在'where'子句中使用

The problem is when i want to pick up eg. 问题是当我想要拿起例如。 latest 100 records for desired SourceUserId 所需SourceUserId的最新100条记录

Query sometimes takes up to 4 minutes to execute, which is not acceptable. 查询有时最多需要4分钟才能执行，这是不可接受的。

Eg. 例如。

SELECT TOP 100 * FROM Events WITH (NOLOCK)
WHERE SourceUserId = '15b534b17-5a5a-415a-9fc0-7565199c3461'
AND 
(
 TypeId IN (2, 3, 4)
    OR 
 (TypeId = 60 AND SrcMemberId != DstMemberId)
)
ORDER BY CreatedAt DESC

I can't do partitioning etc as I am using Standard version of SQL Server and Enterprise is too expensive. 我无法进行分区等，因为我使用的是标准版的SQL Server，而且Enterprise太贵了。

I also think that the table is quite small to be that slow. 我也认为这张表很小很慢。

I think the problem is with ORDER BY clause as db must go through much bigger set of data. 我认为问题在于ORDER BY子句，因为db必须经历更大的数据集。

Any ideas how to make it quicker ? 任何想法如何使它更快？

Perhaps relational database is not a good idea for that kind of data. 也许关系型数据库对于那种数据不是一个好主意。

Data is always being picked up ordered by CreatedAt DESC 始终通过CreatedAt DESC订购数据

Thank you for reading. 谢谢你的阅读。

PabloX PabloX

Answer 1

You'll likely want to create a composite index for this type of query - when the query runs slowly it is most likely choosing to scan down an index on the CreatedAt column and perform a residual filter on the SourceUserId value, when in reality what you want to happen is to jump directly to all records for a given SourceUserId ordered properly - to achieve this, you'll want to create a composite index primarily on SourceUserId (performing an equality check) and secondarily on CreateAt (to preserve the order within a given SourceUserId value). 您可能希望为此类查询创建复合索引 - 当查询运行缓慢时，很可能选择扫描CreatedAt列上的索引并对SourceUserId值执行残差过滤，实际上是什么想要发生的是直接跳转到正确排序的给定SourceUserId的所有记录 - 要实现这一点，你需要主要在SourceUserId上创建一个复合索引（执行相等性检查），然后在CreateAt上创建一个复合索引（以保留一个给定SourceUserId值）。 You may want to try adding the TypeId in as well, depending on the selectivity of this column. 您可能还想尝试添加TypeId，具体取决于此列的选择性。

So, the 2 that will most likely give the best repeatable performance (try them out and compare) would be: 因此，最有可能提供最佳可重复性能的2（尝试它们并进行比较）将是：

Index on (SourceUserId, CreatedAt) 索引（SourceUserId，CreatedAt）
Index on (SourceUserId, TypeId, CreatedAt) 索引（SourceUserId，TypeId，CreatedAt）

As always, there are also many other considerations to take into account with determining how/what/where to index, as Remus discusses in a separate answer one big consideration is covering the query vs. keeping lookups. 与往常一样，在确定索引的方式/内容/位置时还需要考虑许多其他因素，正如Remus在单独的答案中讨论的那样，一个重要的考虑因素是覆盖查询与保持查找。 Additionally you'll need to consider write volumes, possible fragmentation impact (if any) , singleton lookups vs. large sequential scans, etc., etc. 此外，您还需要考虑写入卷，可能的碎片影响（如果有），单例查找与大型顺序扫描等等。

Answer 2

I have indexes on each column except MetaId 除了MetaId，我在每列上都有索引

Non-covering indexes will likely hit the 'tipping point' and the query would revert to a table scan. 非覆盖索引可能会达到“临界点” ，查询将恢复为表扫描。 Just adding an index on every column because it is used in a where clause does not equate good index design. 只是在每个列上添加索引，因为它在where子句中使用并不等于良好的索引设计。 To take your query for example, a good 100% covering index would be: 以您的查询为例，一个好的100％覆盖索引将是：

INDEX ON (SourceUserId , CreatedAt) INCLUDE (TypeId, SrcMemberId, DstMemberId)

Following index is also usefull, altough it still going to cause lookups: 以下索引也很有用，尽管它仍然会导致查找：

INDEX ON (SourceUserId , CreatedAt) INCLUDE (TypeId)

and finaly an index w/o any included column may help, but is just as likely will be ignored (depends on the column statistics and cardinality estimates): 最后一个没有任何包含列的索引可能会有所帮助，但同样可能会被忽略（取决于列统计和基数估计）：

INDEX ON (SourceUserId , CreatedAt)

But a separate index on SourceUSerId and one on CreatedAt is basically useless for your query. 但是，对于您的查询，SourceUSerId上的单独索引和CreatedAt上的单独索引基本无用。

See Index Design Basics . 请参阅索引设计基础知识。

Answer 3

The fact that the table has indexes built on GUID values, indicates a possible series of problems that would affect performance: 表具有基于GUID值构建的索引，这表明可能会影响性能的一系列问题：

High index fragmentation: since new GUIDs are generated randomly, the index cannot organize them in a sequential order and the nodes are spread unevenly. 高索引碎片：由于新的GUID是随机生成的，因此索引无法按顺序组织它们，并且节点的分布不均匀。
High number of page splits: the size of a GUID (16 bytes) causes many page splits in the index, since there's a greater chance than a new value wont't fit in the remaining space available in a page. 大量的页面拆分： GUID的大小（16个字节）会导致索引中的页面拆分很多，因为新的值不可能适合页面中剩余的空间。
Slow value comparison: comparing two GUIDs is a relatively slow operation because all 33 characters must be matched. 慢值比较：比较两个GUID是一个相对较慢的操作，因为必须匹配所有33个字符。

Here a couple of resources on how to investigate and resolve these problems: 这里有几个关于如何调查和解决这些问题的资源：

Answer 4

I would recomend getting the data in 2 sep var tables 我建议在2个sep var表中获取数据

INSERT INTO @Table1
SELECT * FROM Events WITH (NOLOCK)
WHERE SourceUserId = '15b534b17-5a5a-415a-9fc0-7565199c3461'
AND 
(
 TypeId IN (2, 3, 4)
)
INSERT INTO @Table2
SELECT * FROM Events WITH (NOLOCK)
WHERE SourceUserId = '15b534b17-5a5a-415a-9fc0-7565199c3461'
AND 
(
 (TypeId = 60 AND SrcMemberId != DstMemberId)
)

then apply a unoin from the selects, ordered and top. 然后从选择，有序和顶部应用unoin。 Limit the data from the get go. 限制来自get go的数据。

Answer 5

I suggest using a UNION: 我建议使用UNION：

SELECT TOP 100 x.*
  FROM (SELECT a.*
          FROM EVENTS a
         WHERE a.typeid IN (2, 3, 4)
        UNION ALL
        SELECT b.*
          FROM EVENTS b
         WHERE b.typeid = 60 
           AND b.srcmemberid != b.dstmemberid) x
 WHERE x.sourceuserid = '15b534b17-5a5a-415a-9fc0-7565199c3461'

Answer 6

We've realised a minor gain by moving to a BIGINT IDENTITY key for our event table; 我们通过移动到事件表的BIGINT IDENTITY键实现了微小的收益; by using that as a clustered primary key, we can cheat and use that for date ordering. 通过将其用作群集主键，我们可以作弊并将其用于日期排序。

Answer 7

我会确保CreatedAt正确编入索引

Answer 8

you could split the query in two with an UNION to avoid the OR (which can cause your index not to be used), something like 您可以使用UNION将查询拆分为两个以避免OR（这可能导致您的索引不被使用），类似于

   SElect * FROM(
 SELECT TOP 100 * FROM Events WITH (NOLOCK)
WHERE SourceUserId = '15b534b17-5a5a-415a-9fc0-7565199c3461'
AND TypeId IN (2, 3, 4)
UNION  SELECT TOP 100 * FROM Events WITH (NOLOCK)
WHERE SourceUserId = '15b534b17-5a5a-415a-9fc0-7565199c3461' 
 AND TypeId = 60 AND SrcMemberId != DstMemberId
)
ORDER BY CreatedAt DESC

Also, check that the uniqueidentifier indexes are not CLUSTERED. 另外，检查uniqueidentifier索引是否不是CLUSTERED。

Answer 9

If there are 100K records added each day, you should check your index fragmentation. 如果每天添加100K记录，则应检查索引碎片。 And rebuild or reorganize it accordingly. 并相应地重建或重组它。 More info : SQLauthority 更多信息： SQLauthority

SQL Server从大表中选择慢

问题描述

9 个解决方案

解决方案1
15 已采纳 2009-12-02 19:31:26

解决方案2
6 2009-12-02 19:35:35

解决方案3
5 2009-12-02 20:03:45

解决方案4
1 2009-12-02 19:32:56

解决方案5
1 2009-12-02 19:34:29

解决方案6
1 2009-12-02 20:27:53

解决方案7
0 2009-12-02 19:28:12

解决方案8
0 2009-12-02 19:32:18

解决方案9
0 2009-12-02 19:39:22

SQL Server从大表中选择慢

问题描述

9 个解决方案

解决方案1 15 已采纳 2009-12-02 19:31:26

解决方案2 6 2009-12-02 19:35:35

解决方案3 5 2009-12-02 20:03:45

解决方案4 1 2009-12-02 19:32:56

解决方案5 1 2009-12-02 19:34:29

解决方案6 1 2009-12-02 20:27:53

解决方案7 0 2009-12-02 19:28:12

解决方案8 0 2009-12-02 19:32:18

解决方案9 0 2009-12-02 19:39:22

解决方案1
15 已采纳 2009-12-02 19:31:26

解决方案2
6 2009-12-02 19:35:35

解决方案3
5 2009-12-02 20:03:45

解决方案4
1 2009-12-02 19:32:56

解决方案5
1 2009-12-02 19:34:29

解决方案6
1 2009-12-02 20:27:53

解决方案7
0 2009-12-02 19:28:12

解决方案8
0 2009-12-02 19:32:18

解决方案9
0 2009-12-02 19:39:22