简体   繁体   English

Person表主键的最佳选择

[英]The best choice for Person table primary key

What is your choice for primary key in tables that represent a person (like Client, User, Customer, Employee etc.)?您对代表个人(如客户、用户、客户、员工等)的表中的主键的选择是什么? My first choice would be an social security number (SSN).我的第一选择是社会安全号码 (SSN)。 However, using SSN has been discouraged because of privacy concerns and different regulations.但是,由于隐私问题和不同的规定,不鼓励使用 SSN。 SSN can change during person lifetime, so that is another reason against it. SSN 可以在人的一生中改变,所以这是反对它的另一个原因。

I guess that one of the functions of well chosen natural primary key is to avoid duplication.我想精心选择的自然主键的功能之一是避免重复。 I do not want a person to be registered twice in the database.我不希望一个人在数据库中注册两次。 Some surrogate or generated primary key does not help in avoiding duplicate entries.某些代理或生成的主键无助于避免重复条目。 What is the best way to approach this?解决这个问题的最佳方法是什么?

What is the best way to guarantee uniqueness in your application for person entity and can this be handled on database level with primary key or uniqueness constraint?在您的个人实体应用程序中保证唯一性的最佳方法是什么,这可以在数据库级别使用主键或唯一性约束处理吗?

I don't know which Database engine you are using, but (at least with MySQL -- see 7.4.1. Make Your Data as Small as Possible ) , using an integer, the shortest possible, is generally considered best for performances and memory requirements.我不知道您使用的是哪个数据库引擎,但是(至少对于 MySQL - 请参阅7.4.1. 使您的数据尽可能小 ,使用尽可能短的整数通常被认为对性能和内存来说是最好的要求。

I would use an integer, auto_increment , for that primary key.对于该主键,我会使用整数auto_increment
The idea being :这个想法是:

  • If the PK is short, it helps identifying each row (it's faster and easier to compare two integers than two long strings)如果 PK 很短,则有助于识别每一行(比较两个整数比比较两个长字符串更快、更容易)
  • If a column used in foreign keys is short, it'll require less memory for foreign keys, as the value of that column is likely to be stored in several places.如果外键中使用的列很短,则外键需要的内存较少,因为该列的值可能存储在多个位置。

And, then, set a UNIQUE index on an other column -- the one that determines unicity -- if that's possible and/or necessary.然后,如果可能和/或必要,在另一列上设置一个UNIQUE索引 - 确定唯一性的那个。


Edit: Here are a couple of other questions/answers that might interest you :编辑:以下是您可能感兴趣的其他几个问题/答案:

As mentioned above, use an auto-increment as your primary key.如上所述,使用自动增量作为主键。 But I don't believe this is your real question.但我不相信这是你真正的问题。

Your real question is how to avoid duplicate entries.您真正的问题是如何避免重复条目。 In theory, there is no way - 2 people could be born on the same day, with the same name, and live in the same household, and not have a social insurance number available for one or the other.从理论上讲,这是不可能的——两个人可以在同一天出生,同名,生活在同一个家庭,而且没有一个可以使用的社会保险号。 (One might be a foreigner visiting the country). (一个可能是访问该国的外国人)。

However, the combination of full name, birthdate, address, and telephone number is usually sufficient to avoid duplication.但是,全名、生日、地址和电话号码的组合通常足以避免重复。 Note that addresses may be entered differently, people may have multiple phone numbers, and people may choose to omit their middle name or use an initial.请注意,地址的输入方式可能不同,人们可能有多个电话号码,人们可能会选择省略中间名或使用首字母。 It depends on how important it is to avoid duplicate entries, and how large is your userbase (and thus the likelihood of a collision).这取决于避免重复条目的重要性,以及您的用户群有多大(以及发生冲突的可能性)。

Of course, if you can get the SSN/SIN then use that to determine uniqueness.当然,如果您可以获得 SSN/SIN,则可以使用它来确定唯一性。

What attributes are available to you?您可以使用哪些属性? Which ones does your application care about ?您的应用程序关心哪些? For example no two people can be born at exactly the same second at exactly the same place, but you probably don't have access to that data at that level of accuracy!例如,没有两个人可以在完全相同的时间在完全相同的地方出生,但您可能无法以那种准确度访问该数据! So you need to decide, from the attributes you intend on modeling, which ones are sufficient to provide an acceptable level of data integrity.因此,您需要根据打算建模的属性来决定哪些属性足以提供可接受的数据完整性级别。 Whatever you choose, you're right in focusing on the data integrity aspects (preventing insertion of multiple rows for the same person) of your selection.无论您选择什么,您都应该关注您选择的数据完整性方面(防止为同一个人插入多行)。

For Joins/Foreign Keys in other tables, it is best to use a surrogate key.对于其他表中的连接/外键,最好使用代理键。

I've grown to consider the use of the word Primary Key as a misnomer, or at best, confusing.我逐渐认为使用主键这个词是用词不当,或者充其量是令人困惑的。 Any key, whether you flag it as Primary Key , Alternate Key , Unique Key , or Unique Index , is still a Key, and requires that every row in the table contain unique values for the attributes in the key.任何键,无论您将其标记为Primary KeyAlternate KeyUnique Key还是Unique Index ,仍然是 Key,并且要求表中的每一行都包含键中属性的唯一值。 In that sense, all keys are equivilent.从这个意义上说,所有的键都是等价的。 What matters more (Most), is whether they are natural keys (dependant on meaningful real- domain model data attributes), or surrogates (Independendant of real data attributes)更重要的是(大多数)它们是自然键(取决于有意义的真实域模型数据属性)还是代理(独立于真实数据属性)

Secondly, what also matters is what you use the key for.. Surrogate keys are narrow and simple and never change (No reason to - they don't mean anything) So they are a better choice for joins or for foreign Keys in other dependant tables.其次,同样重要的是您使用密钥的目的......代理键狭窄而简单并且永远不会改变(没有理由 - 它们没有任何意义)所以它们是连接或其他依赖中的外键的更好选择表。

But to ensure data integrity, and prevent insertion of multiple rows for the same domain entity, they are totally useless... For that you need some kind of Natural Key , chosen from the data you have available, and which your application is modeling for some purpose.但是为了确保数据完整性,并防止为同一域实体插入多行,它们完全没有用......为此,您需要某种Natural Key ,从您可用的数据中选择,并且您的应用程序正在为其建模某种目的。

The key does not have to be 100% immutable.密钥不必是 100% 不可变的。 If (as an example), you use Name and Phone Number and Birthdate, for example, even if a person changes their name, or their phone number, you can simply change the value in the table.如果(例如)您使用姓名和电话号码以及出生日期,例如,即使某人更改了姓名或电话号码,您也可以简单地更改表中的值。 As long as no other row already has the new values in their key attributes, you are fine.只要其他行的键属性中没有新值,就可以了。

Even if the key you select only works in 99.9% of the cases, (say you are unlucky enough to run into two people with the same name and phone number and were coincidentally born the same day), well, at least 99.9% of your data will be guaranteed to be accurate and consistent - and you can for example, just add time to their birthdate to make them unique, or add some other attribute to the key to distinquish them.即使您选择的密钥仅在 99.9% 的情况下有效(假设您不幸遇到两个姓名和电话号码相同并且巧合的是同一天出生的人),那么至少您的 99.9%数据将被保证是准确和一致的——例如,您可以在他们的出生日期中添加时间以使其独一无二,或者向密钥添加一些其他属性以区分它们。 As long as you don't have to update data values in Foreign Keys throughout your database because of the change, (since you are not using this key as a FK elsewhere) you are not facing any significant issue.只要您不必因为更改而在整个数据库中更新外键中的数据值(因为您没有将此键用作其他地方的 FK),您就不会面临任何重大问题。

Use an autogenerated integer primary key, and then put a unique constraint on anything that you believe should be unique.使用自动生成的整数主键,然后对您认为应该是唯一的任何内容设置唯一约束。 But SSNs are not unique in the real world so it would be a bad idea to put a uniqueness constraint on this column unless you think turning away customers because your database won't accept them is a good business model.但是 SSN 在现实世界中并不是唯一的,因此在此列上设置唯一性约束是一个坏主意,除非您认为因为数据库不接受客户而拒绝客户是一种很好的商业模式。

I prefer natural keys, but a table person is a lost case.我更喜欢自然键,但桌子上的person是一个丢失的案例。 SSNs are not unique and not everybody has one. SSN 不是唯一的,也不是每个人都有。

I'd recommend a surrogate key.我会推荐一个代理键。 Add all the indexes you need for other candidate keys, but keeping business logic out of the key is my recommendation.添加其他候选键所需的所有索引,但我的建议是将业务逻辑排除在键之外。

I prefer natural keys, when they can be trusted.我更喜欢自然键,当它们可以被信任时。

Unless you are running a bank or something like that, there is no reason for your clients and users to provide you with a valid SSN, or even necessarily to have one.除非您经营银行或类似机构,否则您的客户和用户没有理由为您提供有效的 SSN,甚至不必拥有。 Thus, for business reasons, you are forced to distrust SSN in the case you outline.因此,出于商业原因,您不得不在您概述的情况下不信任 SSN。 A similar argumant would hold for any given natural key to "persons".对于“人”的任何给定的自然键,类似的论点都成立。

You have no choice but to assign an artificial (Read "surrogate") key.您别无选择,只能分配一个人工(阅读“代理”)键。 It might as well be an integer.它也可能是一个整数。 Make sure it's big enough integer so you aren't going to need toexpand it real soon.确保它是足够大的整数,这样你就不需要很快扩展它。

To add to @Mark and @Pascal (autoincrement integers are your best bet) -- SSN's are usefull and should be modelled correctly.添加到@Mark 和@Pascal(自动增量整数是你最好的选择)——SSN 很有用,应该正确建模。 Security concerns are part of application logic.安全问题是应用程序逻辑的一部分。 You can normalize them into a separate table, and you can make them unique by providing a date-issued field.您可以将它们规范化到一个单独的表中,并且可以通过提供一个发布日期的字段来使它们独一无二。

ps, to those who disagree with the `security in application' point, an enterprise DB will have a granular ACL model; ps,对于那些不同意“应用程序安全性”这一点的人,企业数据库将有一个细粒度的 ACL 模型; so this won't be a sticking point.所以这不会成为症结所在。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM