简体   繁体   中英

Normalizing table to 5 NF

I'm having a hard time deciding if a relation should be normalized to 5 NF.

Lets say I have an all key relation made up of:

A*, B*, C, D

  • A and B are foreign keys from another table which has A and B as primary key
  • C could be X1, X2, X3
  • D could be Y1, Y2, Y3

In the relation, C and D are combinations of each other.

Example data:

  • 1, 2, X1, Y2
  • 3, 4, X2, Y2
  • 5, 6, X1, Y3
  • 7, 8, X2, Y1

Does it make sense to normalize this relation into the following:

  • A, B, C
  • A, B, D
  • C, D

Where the relation which hold C, D contains all possible combinations

If (A,B) is a key in your relation (assuming this is indicated by the stars), then it is already in 4NF, since both C and D are each functionally dependent on (A,B). The decomposition into 5NF then simply is

(A,B,C)
(A,B,D)

You don't need a further relation (C,D). A quick check in SQL confirms that for your example data:

create table t1(A,B,C);
create table t2(A,B,D);

insert into t1 values (1,2,'X1'), (3,4,'X2'), (5,6,'X1'), (7,8,'X2');
insert into t2 values (1,2,'Y2'), (3,4,'Y2'), (5,6,'Y3'), (7,8,'Y1');

select * from t1 natural join t2; 

A           B           C           D
----------  ----------  ----------  ----------
1           2           X1          Y2
3           4           X2          Y2
5           6           X1          Y3
7           8           X2          Y1

As to whether it makes sense to decompose to your relation: In generally, I would always go for that relational design that ensures the maximum data consistency. In your cases, going from 4NF to 5NF does not protect you from any further insert/update/delete anomalies. You simply partition your data horizontally, which might make sense from a point of separation of concerns, but is not required from a point of data consistency.

Edit: Added discussion for the case when the key is (A,B,C,D)

If (A,B,C,D) is the key in your relation, and the the project-join dependencies in your data are the ones you put in your question ( R = (A,B,C) * (A,B,D) * (C,D), not only for your example data, but as a data integrity rule), then the 5NF schema will enforce your data consistency whereas your original schema will not (you can have insert/update/delete anomalies). Thus, from a logical point of view, you should use the 5NF schema, otherwise you have to enforce data integrity on the application level.

As usually (and as for 3NF, too), there can be specific performance requirements that force you to denormalize your schema (eg, to save joins when quering your data), but unless forced to do so, I would always go for the best conceptual schema possible. For many DBMS, query performance can even be improved for your 5NF design on the physical level by, eg, using proper indexes and/or incremental materialized views, without giving up a proper logical relational design. But of course you might have to trade consistency for performance or space-efficiency at some point.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM