简体   繁体   English

SQL Server - Guid VS. 长

[英]SQL Server - Guid VS. Long

Up until now i've been using the C# "Guid = Guid.NewGuid();" 到目前为止,我一直在使用C#“Guid = Guid.NewGuid();” method to generate a unique ID that can be stored as the ID field in some of my SQL Server database tables using Linq to SQL. 生成唯一ID的方法,该ID可以使用Linq to SQL存储在我的某些SQL Server数据库表中作为ID字段。 I've been informed that for indexing reasons, using a GUID is a bad idea and that I should use an auto-incrementing Long instead. 我被告知,出于索引的原因,使用GUID是一个坏主意,我应该使用自动递增Long。 Will using a long speed up my database transactions? 将使用长时间加速我的数据库事务? If so, how do I go about generating unique ID's that are of type Long? 如果是这样,我该如何生成Long类型的唯一ID?

Regards, 问候,

Both have pros and cons, it depends entirely on how you use them that matters. 两者都有利有弊,这完全取决于你如何使用它们。

Right off the bat, if you need identifiers that can work across several databases, you need GUIDs. 如果您需要可以跨多个数据库工作的标识符,您需要GUID。 There are some tricks with Long (manually assigning each database a different seed/increment), but these don't scale well. Long有一些技巧(手动为每个数据库分配不同的种子/增量),但这些技巧不能很好地扩展。

As far as indexing goes, Long will give much better insert performance if the index is clustered (by default primary keys are clustered, but this can be modified for your table), since the table does not need to be reorganized after every insert. 就索引而言,如果索引是聚簇的,Long将提供更好的插入性能(默认情况下,主键是聚簇的,但可以为您的表修改),因为在每次插入后都不需要重新组织表。

As far as concurrent inserts are concerned however, Long (identity) columns will be slower then GUID - identity column generation requires a series of exclusive locks to ensure that only one row gets the next sequential number. 然而,就并发插入而言,Long(标识)列将比GUID慢 - 标识列生成需要一系列独占锁以确保只有一行获得下一个序列号。 In an environment with many users inserting many rows all the time, this can be a performance hit. 在许多用户一直插入许多行的环境中,这可能会影响性能。 GUID generation in this situation is faster. 在这种情况下GUID生成更快。

Storage wise, a GUID takes up twice the space of a Long (8 bytes vs 16). 存储方面,GUID占用了Long的两倍空间(8字节对16)。 However it depends on the overall size of your row if 8 bytes is going to make a noticable difference in how many records fit in one leaf, and thus the number of leaves pulled from disk during an average request. 但是,如果8个字节在一个叶子中适合多少条记录,从而在平均请求期间从磁盘中提取的叶子数量会产生显着差异,则它取决于行的整体大小。

A long (big int in sql server) is 8 bytes and a Guid is 16 bytes, so you are halving the number of the bytes sql server has to compare when doing a look up. 一个long(sql server中的大int)是8个字节,Guid是16个字节,因此你在查找时将sql server必须比较的字节数减半。

For generating a long, use IDENTITY(1,1) when you create the field in the database. 要生成long,请在数据库中创建字段时使用IDENTITY(1,1)。

so either using create table or alter table: 所以使用create table或alter table:

Field_NAME BIGINT NOT NULL PRIMARY KEY IDENTITY(1,1)

See comments for posting Linq to sql 查看将Linq发布到sql的注释

The "Queen of Indexing" - Kim Tripp - basically says it all in her indexing blog posts: “索引女王” - 金·特里普 - 在她的索引博客文章中基本上都说明了这一点:

Basically, her best practices are: an optimal clustering key should be: 基本上,她的最佳实践是:最佳聚类键应该是:

  • unique 独特
  • small
  • stable (never changing) 稳定(永不改变)
  • ever-increasing 不断增加

GUID's violate the "small" and "ever-increasing" and are thus not optimal. GUID违反了“小”和“不断增加”,因此不是最佳选择。

PLUS: all your clustering keys will be added to each and every single entry in each and every single non-clustered index (as the lookup to actually find the record in the database), thus you want to make them as small as possible (INT = 4 byte vs. GUID = 16 byte). PLUS:所有聚类键都将添加到每个非聚集索引中的每个单独条目中(作为查找以实际查找数据库中的记录),因此您希望使它们尽可能小(INT = 4字节与GUID = 16字节)。 If you have hundreds of millions of rows and several non-clustered indices, choosing an INT or BIGINT over a GUID can make a major difference - even just space-wise. 如果你有数亿行和几个非聚集索引,那么在GUID上选择INT或BIGINT可以产生很大的不同 - 即使只是在空间方面。

Marc

You can debate GUID or identity all day. 你可以整天辩论GUID或身份。 I prefer the database to generate the unique value with an identity. 我更喜欢数据库生成带有标识的唯一值。 If you merge data from multiple databases, add another column (to identify the source database, possibly a tinyint or smallint) and form a composite primary key. 如果合并来自多个数据库的数据,请添加另一列(以标识源数据库,可能是tinyint或smallint)并形成复合主键。

If you do go with an identity, be sure to pick the right datatype, based on number of expected keys you will generate: 如果您确实使用了标识,请确保根据您将生成的预期密钥数选择正确的数据类型:

bigint - 8 Bytes - max positive value: 9,223,372,036,854,775,807  
int    - 4 Bytes - max positive value:             2,147,483,647

Note "number of expected keys " is different than the number of rows. 注意“预期键数”与行数不同。 If you mainly add and keep rows, you may find that an INT is enough with over 2 billion unique keys. 如果您主要添加并保留行,您可能会发现INT足够超过20亿个唯一键。 I'll bet your table won't get that big. 我打赌你的桌子不会那么大。 However, if you have a high volume table where you keep adding and removing rows, you row count may be low, but you'll go through keys fast. 但是,如果你有一个高容量表,你不断添加和删除行,你的行数可能会很低,但你会快速通过键。 You should do some calculations to see how log it would take to go through the INTs 2 billion keys. 您应该进行一些计算,以了解如何通过INT 20亿个密钥进行日志记录。 If it won't use them up any time soon go with INT, otherwise double the key size and go with BIGINT. 如果它不会在短时间内使用它们,请使用INT,否则将键大小加倍并使用BIGINT。

Use guids when you need to consider import/export to multiple databases. 当您需要考虑导入/导出到多个数据库时,请使用guid。 Guids are often easier to use than columns specifying the IDENTITY attribute when working with a dataset of multiple child relationships. 在处理多个子关系的数据集时,Guids通常比指定IDENTITY属性的列更容易使用。 this is because you can randomly generate guids in the code in a disconnected state from the database, and then submit all changes at once. 这是因为您可以在与数据库断开连接的状态下在代码中随机生成guid,然后一次提交所有更改。 When guids are generated properly, they are insainely hard to duplicate by chance. 当guids正确生成时,它们很难偶然复制。 With identity columns, you often have to do an intial insert of a parent row and query for it's new identity before adding child data. 对于标识列,您通常必须在添加子数据之前对父行进行初始插入并查询其新标识。 You then have to update all child records with the new parent identity before committing them to the database. 然后,在将它们提交到数据库之前,必须使用新的父标识更新所有子记录。 The same goes for grandchildren and so on down the heirarchy. 对孙子孙女来说同样如此,等等。 It builds up to a lot of work that seems unnecessary and mundane. 它构建了许多看起来不必要和平凡的工作。 You can do something similar to Guids by comming up with random integers without the IDENTITY specification, but the chance of collision is greatly increased as you insert more records over time. 您可以通过在没有IDENTITY规范的情况下使用随机整数来执行与Guids类似的操作,但随着时间的推移插入更多记录时,碰撞的可能性会大大增加。 (Guid.NewGuid() is similar to a random Int128 - which doesn't exist yet). (Guid.NewGuid()类似于随机的Int128 - 它还不存在)。

I use Byte (TinyInt), Int16 (SmallInt), Int32/UInt16 (Int), Int64/UInt32 (BigInt) for small lookup lists that do not change or data that does not replicate between multiple databases. 我使用Byte(TinyInt),Int16(SmallInt),Int32 / UInt16(Int),Int64 / UInt32(BigInt)用于不更改的小型查找列表或不在多个数据库之间复制的数据。 (Permissions, Application Configuration, Color Names, etc.) (权限,应用程序配置,颜色名称等)

I imagine the indexing takes just as long to query against regardless if you are using a guid or a long. 我想无论你使用的是guid还是long,索引都需要与查询一样长。 There are usually other fields in tables that are indexed that are larger than 128 bits anyway (user names in a user table for example). 表中通常还有其他字段的索引大于128位(例如用户表中的用户名)。 The difference between Guids and Integers is the size of the index in memory, as well as time populating and rebuilding indexes. Guids和Integers之间的区别在于内存中索引的大小,以及填充和重建索引的时间。 The majority of database transactions is often reading. 大多数数据库事务通常都在阅读。 Writing is minimal. 写作很少。 Concentrate on optimizing reading from the database first, as they are usually made of joined tables that were not optimized properly, improper paging, or missing indexes. 首先要集中精力优化数据库读取,因为它们通常由未正确优化的联接表,不正确的分页或缺少索引组成。

As with anything, the best thing to do is to prove your point. 与任何事情一样,最好的办法就是证明自己的观点。 create a test database with two tables. 使用两个表创建一个测试数据库。 One with a primary key of integers/longs, and the other with a guid. 一个主键为long / long,另一个为guid。 Populate each with N-Million rows. 每个用N-Million行填充。 Moniter the performance of each during the CRUD operations (create, read, update, delete). 在CRUD操作(创建,读取,更新,删除)期间监视每个的性能。 You may find out that it does have a performance hit, but insignificant. 您可能会发现它确实有性能损失,但无关紧要。

Servers often run on boxes without debugging environments and other applications taking up CPU, Memory, and I/O of hard drive (especially with RAID). 服务器通常在没有调试环境的盒子上运行,而其他应用程序占用硬盘的CPU,内存和I / O(特别是使用RAID)。 A development environment only gives you an idea of performance. 开发环境只能让您了解性能。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM