简体   繁体   English

SQL主键

[英]SQL Primary Keys

So, a co-worker and I are in an argument on which way is better for generating primary keys that are GUIDs. 因此,我和一位同事正在争论哪种方式更适合生成GUID的主键。

We are using .NET 4.0 with Entities 4 and using stored procs to make select/insert/updates. 我们使用带有实体4的.NET 4.0并使用存储过程来进行选择/插入/更新。

He wants to create GUID primary key in code and pass it back as part of the insert using the Guid class or/and using some created Sequential GUID class. 他希望在代码中创建GUID主键,并使用Guid类或/和使用一些创建的Sequential GUID类将其作为插入的一部分传回。

I want the GUID to be created by SQL Server on insert using either newid() or newsequentialid(). 我希望使用newid()或newsequentialid()在插入时由SQL Server创建GUID。

My argument against his way is that if you have to do multiple inserts you have to make a roundtrip to get a guid for each insert so you maintain that relationship for your foreign key constraints. 我反对他的方式的论点是,如果你必须进行多次插入,你必须进行往返以获得每个插入的guid,以便为外键约束维护该关系。 Plus, using this way you have to make several roundtrips for each insert. 另外,使用这种方式,您必须为每个插入物进行几次往返。

His argument about using SQL to do is that he doesn't have access to the key BEFORE the insert happens and has to wait for the insert to happen to get the primary key guid back to use in other parts of code. 他关于使用SQL做的论点是,在插入发生之前他无法访问密钥,并且必须等待插入才能使主键guid重新用于代码的其他部分。 This way you can make one connection to a stored proc and it handles all the inserts. 这样,您可以与存储过程建立一个连接,并处理所有插入。

So, Which method is better for single inserts? 那么,哪种方法更适合单个插入? Which method is better for multiple inserts in a transaction? 哪种方法更适合事务中的多个插入?

GUIDs may seem to be a natural choice for your primary key - and if you really must, you could probably argue to use it for the PRIMARY KEY of the table. GUID似乎是您主键的自然选择 - 如果您真的必须,您可能会争辩将其用于表的PRIMARY KEY。 What I'd strongly recommend not to do is use the GUID column as the clustering key , which SQL Server does by default, unless you specifically tell it not to. 我强烈建议不要使用GUID列作为群集密钥 ,默认情况下SQL Server会执行此操作,除非您明确告知不要这样做

You really need to keep two issues apart: 你真的需要分开两个问题:

1) the primary key is a logical construct - one of the candidate keys that uniquely and reliably identifies every row in your table. 1) 主键是一个逻辑结构 - 一个候选键,它唯一且可靠地标识表中的每一行。 This can be anything, really - an INT, a GUID, a string - pick what makes most sense for your scenario. 这可以是任何东西,真的 - 一个INT,一个GUID,一个字符串 - 选择对你的场景最有意义的东西。

2) the clustering key (the column or columns that define the "clustered index" on the table) - this is a physical storage-related thing, and here, a small, stable, ever-increasing data type is your best pick - INT or BIGINT as your default option. 2) 聚类键 (在表上定义“聚簇索引”的一列或多列) - 这是一个与物理存储相关的东西,这里,一个小的,稳定的,不断增加的数据类型是你最好的选择 - INT或BIGINT作为默认选项。

By default, the primary key on a SQL Server table is also used as the clustering key - but that doesn't need to be that way! 默认情况下,SQL Server表上的主键也用作群集键 - 但这不一定是这样! I've personally seen massive performance gains when breaking up the previous GUID-based Primary / Clustered Key into two separate key - the primary (logical) key on the GUID, and the clustering (ordering) key on a separate INT IDENTITY(1,1) column. 我个人看到将以前的基于GUID的主/群集密钥分解为两个单独的密钥 - GUID上的主(逻辑)密钥和单独的INT IDENTITY上的群集(排序)密钥(1, 1)专栏。

As Kimberly Tripp - the Queen of Indexing - and others have stated a great many times - a GUID as the clustering key isn't optimal, since due to its randomness, it will lead to massive page and index fragmentation and to generally bad performance. 正如Kimberly Tripp--索引女王 - 和其他人已多次声明 - GUID因为聚类键不是最佳的,因为由于其随机性,它将导致大量页面和索引碎片以及通常不良的性能。

Yes, I know - there's newsequentialid() in SQL Server 2005 and up - but even that is not truly and fully sequential and thus also suffers from the same problems as the GUID - just a bit less prominently so. 是的,我知道 - 在SQL Server 2005及更高版本中有newsequentialid() - 但即使这样也不是真正完全顺序的,因此也会遇到与GUID相同的问题 - 只是不那么突出。 If you insist on GUID, then at least use the newsequentialid() method on the server! 如果你坚持使用GUID,那么至少在服务器上使用newsequentialid()方法!

Then there's another issue to consider: the clustering key on a table will be added to each and every entry on each and every non-clustered index on your table as well - thus you really want to make sure it's as small as possible. 然后还有另一个需要考虑的问题:表格上的聚类键也会添加到表格中每个非聚集索引的每个条目上 - 因此您确实希望确保它尽可能小。 Typically, an INT with 2+ billion rows should be sufficient for the vast majority of tables - and compared to a GUID as the clustering key, you can save yourself hundreds of megabytes of storage on disk and in server memory. 通常,对于绝大多数表来说,具有2亿行的INT应该足够 - 并且与作为群集密钥的GUID相比,您可以在磁盘和服务器内存中节省数百兆的存储空间。

Quick calculation - using INT vs. GUID as Primary and Clustering Key: 快速计算 - 使用INT与GUID作为主要和群集密钥:

  • Base Table with 1'000'000 rows (3.8 MB vs. 15.26 MB) 基表有1'000'000行(3.8 MB对15.26 MB)
  • 6 nonclustered indexes (22.89 MB vs. 91.55 MB) 6个非聚簇索引(22.89 MB对91.55 MB)

TOTAL: 25 MB vs. 106 MB - and that's just on a single table! 总计:25 MB对106 MB - 这只是在一张桌子上!

Some more food for thought - excellent stuff by Kimberly Tripp - read it, read it again, digest it! 还有一些值得深思的东西 - 金佰利特里普的优秀作品 - 阅读,再读一遍,消化它! It's the SQL Server indexing gospel, really. 这是SQL Server索引福音,真的。

Marc

When I have questions like these I say to myself "SQL Server is good at sets, so lets let it do what its good at" and sometimes "1 is just a specific case of N". 当我有这样的问题时,我对自己说“SQL Server擅长集合,所以让它让它做得好”,有时“1只是N的特定情况”。

Which method is better for single inserts? 单个刀片哪种方法更好?

The single insert time will be the same for either of your approaches for a synchronous sql call. 对于同步sql调用的任何一种方法,单个插入时间都是相同的。 However "his" approach will give you more problems with seek time down the line because his sequential guid method won't be as good as sql servers (and you will probably lose the global uniqueness). 然而,“他的”方法会给你带来更多问题,因为他的顺序guid方法不会像sql服务器一样 (你可能会失去全局唯一性)。 It will also split your code base when you inevitably need to do multiple inserts. 当您不可避免地需要进行多次插入时,它还会拆分您的代码库。

Which method is better for multiple inserts in a transaction? 哪种方法更适合事务中的多个插入?

If you are arguing a set based insert ( insert / select ) vs a single line insert (insert into), the set based is going to win on multiple inserts because the trip back to the client is going the expensive part. 如果您正在争论基于集合的插入(插入/选择)与单行插入(插入),基于集合将在多个插入上获胜,因为返回到客户端的行程将是昂贵的部分。

If this were me I would create a SP that takes a serialized collection of the objects to insert, does an insert / select with an output clause , check out "Example B. Using OUTPUT with identity and computed columns" on this page , let sql server create the GUID (if you are stuck on it) and return to the client or run the next statement in the SP to insert child rows based on the output table your insert generated. 如果这是我,我将创建一个SP,其中包含要插入的对象的序列化集合,使用输出子句执行insert / select,在此页面上查看“示例B.使用带有标识和计算列的OUTPUT”,让sql server创建GUID(如果你被卡在上面)并返回到客户端或运行SP中的下一个语句,根据insert生成的输出表插入子行。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM