简体   繁体   中英

Database Design Primay Key, ID vs String

I am currently planning to develop a music streaming application. And i am wondering what would be better as a primary key in my tables on the server. An ID int or a Unique String.

Methods 1:

Songs Table: SongID (int), Title(string), *Artist**(string), Length(int), *Album**(string)

Genre Table Genre (string), Name(string)

SongGenre: ***SongID****(int), ***Genre****(string)

Method 2

Songs Table: SongID (int), Title(string), *ArtistID**(int), Length(int), *AlbumID**(int)

Genre Table GenreID (int), Name(string)

SongGenre: ***SongID****(int), ***GenreID****(int)

Key: Bold = Primary Key, *Field** = Foreign Key

I'm currently designing using method 2 as I believe it will speed up lookup performance and use less space as an int takes a lot less space then a string.

Is there any reason this isn't a good idea? Is there anything I should be aware of?

你正在做正确的事情 - 身份字段应该是数字而不是基于字符串,既节省空间又出于性能原因(字符串上的匹配键比整数上的匹配慢)。

Is there any reason this isn't a good idea? Is there anything I should be aware of?

Yes. Integer IDs are very bad if you need to uniquely identify the same data outside of a single database. For example, if you have to copy the same data into another database system with potentially pre-existing data or you have a distributed database. The biggest thing to be aware of is that an integer like 7481 has no meaning outside of that one database. If later on you need to grow that database, it may be impossible without surgically removing your data.

The other thing to keep in mind is that integer IDs aren't as flexible so they can't easily be used for exceptional cases. The designers of the Internet Protocol understood this and took precautions by allocating certain blocks of numbers as "special" in one way or another (broadcast IPs, private IPs, network IPs). But that was only possible because there's a protocol surrounding the usage of those numbers. Many databases don't operate within such a well-defined protocol.

FWIW, it's kind of like trying to decide if having a "strongly typed" programming paradigm is better than a "weakly/dynamically typed" programming paradigm. It will depend on what you need to do.

From the software perspective the GUID is better as its unique globally.

Quotes from: Primary Keys: IDs versus GUIDs

Using a GUID as a row identity value feels more natural-- and certainly more truly unique-- than a 32-bit integer. Database guru Joe Celko seems to agree . GUID primary keys are a natural fit for many development scenarios, such as replication, or when you need to generate primary keys outside the database. But it's still a question of balancing the tradeoffs between traditional 4-byte integer IDs and 16-byte GUIDs:

GUID Pros

  • Unique across every table, every database, every server
  • Allows easy merging of records from different databases
  • Allows easy distribution of databases across multiple servers
  • You can generate IDs anywhere, instead of having to roundtrip to the database
  • Most replication scenarios require GUID columns anyway

GUID Cons

  • It is a whopping 4 times larger than the traditional 4-byte index value; this can have serious performance and storage implications if you're not careful
  • Cumbersome to debug where userid='{BAE7DF4-DDF-3RG-5TY3E3RF456AS10}'
  • The generated GUIDs should be partially sequential for best performance (eg, newsequentialid() on SQL 2005) and to enable use of clustered indexes

My recommendation is: use ids.

You'll be able to rename that "Genre" with 20000 songs without breaking anything.

The idea behind this is that the id identifies the row in the table. Whatever the row has is something that doesn't matters in this problem.

This is in large part a matter of personal preference.

My personal opinion and practice is to always use integer keys and to always use surrogate rather than natural keys (so never use anything like social security number or the genre name directly).

There are cases where an auto number field is not appropriate or does not scale. In these cases it can make sense to use a GUID, which can be a string in databases that do not have a native datatype for it.

使用int时,MSSQL可以为您生成这些id(请参阅IDENTITY关键字)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM