简体   繁体   English

如何重新索引 AuditLog 表? 非聚集主键、聚集覆盖索引、guid

[英]How to re-index an AuditLog table? non clustered primary key, clustered covering index, guid

Using SQL Server 2016 Standard.使用 SQL Server 2016 标准。 I have an existing AuditLog table, with a PK on a bigint column (generated C# side) and an additional index.我有一个现有的AuditLog表,在bigint列(生成的 C# 端)上有一个 PK 和一个附加索引。

CREATE TABLE [dbo].[AuditLog]
(
    [Id] [bigint] NOT NULL,
    [ChangeTime] [datetime] NOT NULL,
    [User] [varchar](100) NOT NULL,
    [RootId] [bigint] NOT NULL,
    [EntityId] [bigint] NOT NULL,
    [EntityName] [varchar](100) NOT NULL,
    [Operation] [varchar](100) NOT NULL,
    [OldValue] [varchar](max) NULL,
    [NewValue] [varchar](max) NULL
)

ALTER TABLE [dbo].[AuditLog] 
    ADD CONSTRAINT [PK_AuditLog] 
        PRIMARY KEY CLUSTERED ([Id] ASC)

CREATE NONCLUSTERED INDEX [IX_AuditLog_RootId] 
    ON [dbo].[AuditLog] ([RootId] ASC)

With the current 105,000,000 rows, the sizes are (using used_page_count * 8K per page):对于当前的 105,000,000 行,大小为(使用 used_pa​​ge_count * 每页 8K):

  • PK_AuditLog: 11,535,112 KB PK_AuditLog:11,535,112 KB
  • IX_AuditLog_RootId: 2,370,480 KB IX_AuditLog_RootId:2,370,480 KB

I now have to create rows in this table from a stored procedure in SQL, not in c# only anymore, so I need a primary key that can be generated SQL side (and C# still).我现在必须从 SQL 中的存储过程在此表中创建行,而不仅仅是在 c# 中,所以我需要一个可以在 SQL 端(和 C# 仍然)生成的主键。 I think my choices are int identity and guid (with a NEWSEQUENTIALID default).我认为我的选择是int identityguid (默认为NEWSEQUENTIALID )。

Since most of my usages include the date and ordering by date, I'm thinking of clustering with that.由于我的大部分用法都包括日期和按日期排序,我正在考虑用它进行聚类。 Sounds right?听起来对吗?

And since I almost always filter by RootId and User , I'm thinking of including them in my index.由于我几乎总是按RootIdUser过滤,因此我正在考虑将它们包含在我的索引中。 Is it a good idea to include the other columns in the clustered index?在聚集索引中包含其他列是个好主意吗? or should they be in a separate covering index?还是应该在单独的覆盖索引中?

Every index needs to identify rows uniquely, so my clustered index will include the primary key even if I don't specify it.每个索引都需要唯一地标识行,因此即使我没有指定主键,我的聚集索引也会包含主键。 So using a Guid as the PK seems a bad idea for storage, particularly with 100million rows.因此,使用Guid作为 PK 对于存储来说似乎是一个坏主意,尤其是对于 1 亿行。 So I'm using a bigint .所以我使用的是bigint

Since my PK is notclustered (therefore not stored physically in that order), how does SQL Server work out the next identity?由于我的 PK 不是群集的(因此没有按该​​顺序物理存储),SQL Server 如何计算下一个标识? I doubt it sorts the PK to find the max value.我怀疑它对 PK 进行排序以找到最大值。 Is using identity on a nonclustered column a bad idea?在非聚集列上使用标识是一个坏主意吗?

Also, I guess I could use datetime2 with precision 3 (storage 7 bytes) instead of datetime (8 bytes) to keep same precision but save a bit of space (or even precision 4 to increase precision for same storage anyway)?另外,我想我可以使用datetime2精度3(存储7个字节),而不是datetime (8个字节),以保持相同的精度,但节省一点空间(甚至精确4,增加精度相同的存储呢)?

So I'm thinking of doing:所以我想这样做:

CREATE TABLE dbo.AuditLog
(
    Id bigint NOT NULL IDENTITY (1, 1),
    ChangeTime datetime2(4) NOT NULL...


ALTER TABLE AuditLog   
    ADD CONSTRAINT [PK_AuditLog] 
        PRIMARY KEY NONCLUSTERED (Id)

CREATE CLUSTERED INDEX CIX_AuditLog_ChangetimeRootUser 
    ON AuditLog(Changetime, RootId, [User])

Footnote脚注

This is how the table is used:这是表的使用方式:

  • No foreign keys to or from this table.此表没有外键。

  • insert heavy (any add/edit/delete of user entity fields inserts a new AuditLog row, constantly during business hours, must be fast)插入重(用户实体字段的任何添加/编辑/删除插入一个新的审计日志行,在工作时间内不断,必须快速)

  • occasional reads (users check what or who changed something, ie, read the AuditLog, a few times a day, would be nice to not wait ages for a query to return)偶尔读取(用户检查什么或谁更改了某些内容,即,每天阅读审计日志几次,最好不要等待很长时间才能返回查询)

  • AuditLog rows are never updated nor deleted once inserted. AuditLog 行一旦插入就永远不会更新或删除。

Typical filters and order:典型的过滤器和顺序:

  • filter by date only仅按日期过滤
  • filter by date and user按日期和用户过滤
  • filter by date and objectId按日期和对象 ID 过滤
  • filter by date and user and objectId按日期和用户和对象 ID 过滤
  • filter by objectId only仅按 objectId 过滤
  • almost always sorted by reverse date, to show most recent changes first.几乎总是按反向日期排序,首先显示最近的更改。
  • often used with paging, using "offset x rows" and "fetch next x rows only"通常与分页一起使用,使用“偏移 x 行”和“仅获取下 x 行”
  • and a specific use case, which amounts to selecting a subset of PK using a where clause, and then self join on the main table using the PK to retrieve column values和一个特定的用例,这相当于使用 where 子句选择 PK 的子集,然后使用 PK 在主表上自连接以检索列值

PS: I'm clear on the process and the time it will take, create temporary new table, copy data in chunks, create indexes, etc... PS:我很清楚这个过程和它需要的时间,创建临时新表,分块复制数据,创建索引等......

Since most of my usages include the date and ordering by date, I'm thinking of clustering with that.由于我的大部分用法都包括日期和按日期排序,我正在考虑用它进行聚类。 Sounds right?听起来对吗?

There is no way to know without doing it and evaluating the results.如果不去做并评估结果,就无法知道。

Is it a good idea to include the other columns in the clustered index?在聚集索引中包含其他列是个好主意吗?

You cannot include columns in the clustered index because it makes little sense.您不能在聚集索引中包含列,因为它没有意义。 The clustered index is ultimately the table.聚集索引最终是表。 You include columns in a NC index to avoid additional lookup to access other columns of the rows.您在 NC 索引中包含列以避免额外查找以访问行的其他列。

how does SQL Server work out the next identity? SQL Server 如何计算下一个标识?

Quite frankly, don't worry about it.坦率地说,别担心。 The engine manages the identity at the table level - it does not need to refer to any specific rows to determine the next value.引擎在表级别管理身份 - 它不需要参考任何特定行来确定下一个值。

Also, I guess I could use datetime2 with precision 3 (storage 7 bytes) instead of datetime (8 bytes) to keep same precision but save a bit of space (or even precision 4 to increase precision for same storage anyway)?另外,我想我可以使用精度为 3(存储 7 个字节)而不是日期时间(8 个字节)的 datetime2 来保持相同的精度,但节省一点空间(甚至精度 4 以提高相同存储的精度)?

DO NOT handicap your data just to save a single byte per row.不要为了每行保存一个字节而妨碍您的数据。 Choose the correct datatype according to your requirements.根据您的要求选择正确的数据类型。 Storage is cheap.存储便宜。 A lack of precision is forever.缺乏精确性是永远的。

In addition, your footnotes are not clear.另外,你的脚注不清楚。 You refer to add/update/delete of user entity fields (which is a meaningless term to those unfamiliar with your schema) and also to "never updated nor deleted".您指的是添加/更新/删除用户实体字段(对于不熟悉您的架构的人来说,这是一个毫无意义的术语)以及“从未更新或删除”。 That seems to be a contradiction, which may or may not be relevant.这似乎是一个矛盾,可能相关也可能不相关。

And one final comment.还有一个最后的评论。 Change involves risk.变化涉及风险。 If your current schema is sufficient, then the safest approach is to simple recreate your table with your ID column as an identity (and everything else remains the same).如果您当前的架构足够,那么最安全的方法是简单地重新创建您的表,并将您的 ID 列作为身份(其他所有内容保持不变)。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM