简体   繁体   English

在考虑规范化和可扩展性的情况下接近数据库时,您对在表列中使用文本标识符有何看法?

[英]What is your opinion on using textual identifiers in table columns when approaching the database with normalization and scalability in mind?

Which table structure is considered better normalized ? 哪种表结构被认为是更好的规范化?

for example 例如

Note: idType tells on which thing the comment has taken place on, and the subjectid is the id of the item the comment has taken place on. 注意:idType告诉评论发生在哪个上,而subjectid是评论发生的项目的id。

useing idType the textually named identifier for the subjectid. 使用idType为subjectid的文本命名标识符。

commentid ---- subjectid ----- idType
--------------------------------------
1                22            post
2                26            photo
3                84            reply
4                36            post
5                22            status

Compared to this. 与此相比。

commentid ---- postid ----- photoid-----replyid
-----------------------------------------------
1                22          NULL        NULL
2                NULL         56         NULL
3                23          NULL        NULL
4                NULL        NULL        55
5                26          NULL        NULL

I am looking at both of them and I dont think in the first table I would be able to relate it to a foreign key constraint =( (ie. comment gets deleted if the post or photo is deleted), where as in the second one that is possible, how would you approach a similar issue keeping in mind that the database will need to expand overtime and data integrity is also important =). 我正在看他们两个,我不认为在第一个表格中我可以将它与外键约束=((即如果帖子或照片被删除则注释被删除),其中如第二个这是可能的,你会如何处理类似的问题,记住数据库需要扩展超时和数据完整性也很重要=)。

Thanks 谢谢

The first is more normalized, if slightly incomplete. 如果稍微不完整,则第一个更规范化。 There are a couple of approaches you can take, the simplest (and strictly speaking, the most 'correct') will need two tables, with the obvious FK constraint. 您可以采取几种方法,最简单的(严格来说,最“正确”)将需要两个表,具有明显的FK约束。

commentid ---- subjectid ----- idType
--------------------------------------
1                22            post
2                26            photo
3                84            reply
4                36            post
5                22            status

idType
------
post
photo
reply
status

If you like, you can use a char(1) or similar to reduce the impact of the varchar on key/index length, or to facilitate use with an ORM if you plan to use one. 如果您愿意,可以使用char(1)或类似物来减少varchar对键/索引长度的影响,或者如果您打算使用ORM,则可以使用ORM。 NULL's are always a bother, and if you start to see them turn up in your design, you will be better off if you can figure out a convenient way to eliminate them. NULL总是很麻烦,如果你开始看到它们出现在你的设计中,如果你能找到一种方便的方法来消除它们,你会更好。

The second approach is one I prefer when dealing with more than 100 million rows: 第二种方法是我在处理超过1亿行时更喜欢的方法:

commentid ---- subjectid
------------------------
1                22    
2                26     
3                84     
4                36     
5                22     

postIds ---- subjectid
----------------------
1                22   
4                36   

photoIds ---- subjectid
-----------------------
2                26    

replyIds ---- subjectid
-----------------------
3                84    

statusIds ---- subjectid
------------------------
5                22     

There is of course also the (slightly denormalized) hybrid approach, which I use extensively with large datasets, as they tend to be dirty. 当然还有(略微非规范化的)混合方法,我广泛使用大型数据集,因为它们往往很脏。 Simply provide the specialization tables for the pre-defined idTypes, but keep an adhoc idType column on the commentId table. 只需为预定义的idTypes提供特化表,但在commentId表上保留一个adhoc idType列。

Note that even the hybrid approach only requires 2x the space of the denormalized table; 请注意,即使是混合方法也只需要2倍于非规范化表的空间; and provides trivial query restriction by idType. 并通过idType提供简单的查询限制。 The integrity constraint however is not straight forward, being an FK constraint on a derived UNION of the type-tables. 然而,完整性约束不是直接的,是对类型表的派生UNION的FK约束。 My general approach is to use a trigger on either the hybrid table, or an equivalent updatable-view to propigate updates to the correct sub-type table. 我的一般方法是在混合表或等效的可更新视图上使用触发器来提升对正确子类型表的更新。

Both the simple approach and the more complex sub-type table approach work; 简单方法和更复杂的子类表方法都可以工作; still, for most purposes KISS applies, so just I suspect you should probably just introduce an ID_TYPES table, the relevant FK, and be done with it. 仍然,对于大多数用途,KISS适用,所以我怀疑你应该只是引入一个ID_TYPES表,相关的FK,并完成它。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM