简体   繁体   English

归一化表至5 NF

[英]Normalizing table to 5 NF

I'm having a hard time deciding if a relation should be normalized to 5 NF. 我很难决定一个关系是否应该归一化为5 NF。

Lets say I have an all key relation made up of: 可以说我有一个由以下所有组成的关键关系:

A*, B*, C, D A B C D

  • A and B are foreign keys from another table which has A and B as primary key A和B是来自另一个以A和B作为主键的表的外键
  • C could be X1, X2, X3 C可以是X1,X2,X3
  • D could be Y1, Y2, Y3 D可以是Y1,Y2,Y3

In the relation, C and D are combinations of each other. 在关系中,C和D是彼此的组合。

Example data: 示例数据:

  • 1, 2, X1, Y2 1,2,X1,Y2
  • 3, 4, X2, Y2 3、4,X2,Y2
  • 5, 6, X1, Y3 5、6,X1,Y3
  • 7, 8, X2, Y1 7,8,X2,Y1

Does it make sense to normalize this relation into the following: 将此关系规范化为以下内容是否有意义:

  • A, B, C A,B,C
  • A, B, D A,B,D
  • C, D C,D

Where the relation which hold C, D contains all possible combinations 保持C,D的关系包含所有可能的组合

If (A,B) is a key in your relation (assuming this is indicated by the stars), then it is already in 4NF, since both C and D are each functionally dependent on (A,B). 如果(A,B)是关系中的关键(假设这由星号表示),则它已经在4NF中,因为C和D都在功能上依赖于(A,B)。 The decomposition into 5NF then simply is 分解成5NF就是

(A,B,C)
(A,B,D)

You don't need a further relation (C,D). 您不需要进一步的关系(C,D)。 A quick check in SQL confirms that for your example data: 快速检查SQL可以确认您的示例数据:

create table t1(A,B,C);
create table t2(A,B,D);

insert into t1 values (1,2,'X1'), (3,4,'X2'), (5,6,'X1'), (7,8,'X2');
insert into t2 values (1,2,'Y2'), (3,4,'Y2'), (5,6,'Y3'), (7,8,'Y1');

select * from t1 natural join t2; 

A           B           C           D
----------  ----------  ----------  ----------
1           2           X1          Y2
3           4           X2          Y2
5           6           X1          Y3
7           8           X2          Y1

As to whether it makes sense to decompose to your relation: In generally, I would always go for that relational design that ensures the maximum data consistency. 关于分解到您的关系是否有意义:通常,我总是会选择确保最大数据一致性的关系设计。 In your cases, going from 4NF to 5NF does not protect you from any further insert/update/delete anomalies. 在您的情况下,从4NF到5NF不能保护您免受任何进一步的插入/更新/删除异常的影响。 You simply partition your data horizontally, which might make sense from a point of separation of concerns, but is not required from a point of data consistency. 您只需对数据进行水平分区,从关注点分离的角度来看这可能是有道理的,但从数据一致性的角度来看并不需要。

Edit: Added discussion for the case when the key is (A,B,C,D) 编辑:添加了有关键为(A,B,C,D)时的讨论

If (A,B,C,D) is the key in your relation, and the the project-join dependencies in your data are the ones you put in your question ( R = (A,B,C) * (A,B,D) * (C,D), not only for your example data, but as a data integrity rule), then the 5NF schema will enforce your data consistency whereas your original schema will not (you can have insert/update/delete anomalies). 如果(A,B,C,D)是关系中的关键,并且数据中的项目联接依赖关系是您在问题中提出的依赖关系(R =(A,B,C)*(A,B ,D)*(C,D),不仅针对示例数据,而且作为数据完整性规则),然后5NF模式将强制您保持数据一致性,而原始模式则不会(您可以具有插入/更新/删除异常) )。 Thus, from a logical point of view, you should use the 5NF schema, otherwise you have to enforce data integrity on the application level. 因此,从逻辑角度来看,您应该使用5NF模式,否则必须在应用程序级别上强制执行数据完整性。

As usually (and as for 3NF, too), there can be specific performance requirements that force you to denormalize your schema (eg, to save joins when quering your data), but unless forced to do so, I would always go for the best conceptual schema possible. 像往常一样(对于3NF也是如此),可能会有一些特定的性能要求迫使您对模式进行非规范化(例如,在查询数据时保存联接),但是除非被迫这样做,否则我将始终力求做到最好概念图可能。 For many DBMS, query performance can even be improved for your 5NF design on the physical level by, eg, using proper indexes and/or incremental materialized views, without giving up a proper logical relational design. 对于许多DBMS,甚至可以通过使用适当的索引和/或递增的物化视图在物理级别上提高5NF设计的查询性能,而无需放弃适当的逻辑关系设计。 But of course you might have to trade consistency for performance or space-efficiency at some point. 但是,当然,您有时必须在性能或空间效率方面牺牲一致性。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM