简体繁体 English

对 UUID 主键使用字符串类型与 uuid 类型相比，对性能有何影响？

[英]What is the performance hit of using a string type vs a uuid type for a UUID primary key?

原文 2017-05-21 20:36:29 1 3 mysql/ postgresql/ indexing/ primary-key/ uuid

Is there much of a speed difference for index lookups by using string for the primary key versus the actual uuid type, specifically if the string has a prefix like user-94a942de-05d3-481c-9e0c-da319eb69206 (making the lookup have to traverse 5-6 characters before getting to something unique)?使用string作为主键与实际uuid类型的索引查找速度是否有很大差异，特别是如果字符串具有像user-94a942de-05d3-481c-9e0c-da319eb69206 （使得查找必须遍历 5 -6 个字符才能获得独特的东西）？

3 个解决方案

This is a micro-optimization and is unlikely to cause a real performance problem until you get to enormous scales. 这是一种微观优化，在你达到巨大的规模之前不太可能导致真正的性能问题。 Use the key that best fits your design. 使用最适合您设计的钥匙。 That said, here's the details... 那就是说，这里的细节......

UUID is a built in PostgreSQL type . UUID是内置的PostgreSQL类型。 It's basically a 128 bit integer. 它基本上是一个128位整数。 It should perform as an index just as well as any other large integer. 它应该像任何其他大整数一样作为索引执行。 Postgres has no built in UUID generating function. Postgres没有内置的UUID生成功能。 You can install various modules to do it on the database, or you can do it on the client. 您可以在数据库上安装各种模块，也可以在客户端上执行。 Generating the UUID on the client distributes the extra work (not much extra work) away from the server. 在客户端上生成UUID会将额外的工作（没有太多额外工作）分散到服务器之外。

MySQL does not have a built in UUID type. MySQL没有内置的UUID类型。 Instead there's a UUID function which can generate a UUID as a string of hex numbers. 相反，有一个UUID函数可以生成UUID作为十六进制数字的字符串。 Because it's a string, UUID keys may have a performance and storage hit. 因为它是一个字符串，UUID键可能会有性能和存储命中。 It may also interfere with replication. 它也可能会干扰复制。

The string UUID will be longer; 字符串UUID会更长; hex characters only encode 4 bits of data per byte so a hex string UUID needs 256 bits to store 128 bits of information. 十六进制字符每字节仅编码4位数据，因此十六进制字符串UUID需要256位来存储128位信息。 This means more storage and memory per column which can impact performance. 这意味着每列更多的存储和内存会影响性能。

Normally this would mean comparisons are twice as long, since the key being compared is twice as long. 通常这意味着比较的时间是两倍，因为比较的密钥是两倍长。 However, UUIDs are normally unique in the first few bytes, so the whole UUID does not need to be compared to know they're different. 但是，UUID在前几个字节中通常是唯一的，因此不需要比较整个UUID就知道它们是不同的。 Long story short: comparing string vs binary UUIDs shouldn't cause a noticeable performance difference in a real application... though the fact that MySQL UUIDs are UTF8 encoded might add cost. 简而言之：比较字符串与二进制UUID不应该在实际应用程序中引起明显的性能差异......尽管MySQL UUID是UTF8编码的事实可能会增加成本。

Using UUIDs on PostgreSQL is fine, it's a built-in type. 在PostgreSQL上使用UUID很好，它是一个内置类型。 MySQL's implementation of UUID keys is pretty incomplete, I'd steer away from it. MySQL的UUID密钥实现非常不完整，我会远离它。 Steer away from MySQL while you're at it. 当你在它的时候，远离MySQL。

The real problem with UUIDs comes when the table (or at least the index) is too big to be cached in RAM. 当表（或至少索引）太大而无法缓存在RAM中时，UUID的真正问题就出现了。 When this happens, the 'next' uuid needs to be stored into (or fetch from) some random block that is unlikely to be cached. 当发生这种情况时，需要将“下一个”uuid存储到一些不太可能被缓存的随机块中（或从中获取）。 This leads to more and more I/O as the table grows. 随着表的增长，这会导致越来越多的I / O.

AUTO_INCREMENT ids usually don't suffer that I/O growth because INSERTs always go at the 'end' of the table and SELECTs usually cluster near the end. AUTO_INCREMENT ids 通常不会遭受I / O增长，因为INSERTs总是位于表的“结尾”，而SELECTs 通常会在结尾附近聚集。 This leads to effective use of the cache, thereby avoiding the death-by-IO. 这导致有效使用高速缓存，从而避免了死亡。

My UUID blog discusses how to make "Type-1" UUIDs less costly to performance, at least for MySQL. 我的UUID博客讨论了如何使“Type-1”UUID的性能成本降低，至少对MySQL来说如此。

Use the built-in UUID type that maps to a 128-bit int.使用映射到 128 位 int 的内置 UUID 类型。 Not just for performance, but to prevent strings like "password1" from showing up in that column.不仅仅是为了性能，也是为了防止像“password1”这样的字符串出现在该列中。