简体   繁体   中英

Why is this table in 3NF?

Consider the following table:

在此处输入图片说明

The primary key is a composite key consisting of PatID and PhysName. My professor says this table is in 3rd normal form. I thought it's not even in second normal form because the non-key attribute, Name, is not dependent on the entire primary key. You can identify the Name simply by looking at PatID. It is not dependent on PhysName.

In order to really know whether the table is in 2NF or not, you would have to have the functional dependencies explicitly laid out for you.

Inferring the FDs from a small sample of data is a risky business. The smaller the sample, the greater the risk.

We would have to see a patient with two physicians here to see whether the name is the same. I expect it would be, but that's only common sense.

When you move on from classroom exercises to million dollar projects, you'll find that common sense is an unreliable substitute for data analysis.

Given a table value we can see what FDs (functional dependencies) hold in it, hence what its CKs (candidate keys) are and what NFs (normal forms) it satisfies (up to BCNF). (We can't know the CKs & NFs without knowing the FDs.)

A FD (or any constraint) holds in a variable when it holds in every value that can arise . Then its CKs and satisfied NFs are based on those FDs. So for a variable, example data tells us that certain FDs don't hold, and the "trivial" FDs must hold, but for the other FDs example data just doesn't tell us whether they hold.

Since the table value doesn't have {PatId, PhysName} as CK, your instructor must mean that that some variable with that value has that CK. (Of course, you should have got value vs variable straight anyway.) In order to consider that that variable has that CK, they must have decided something like:

  • the table holds rows that make a true statement from "a physician named PhysName tends a patient they identify as PatId and know by name PatName "
  • the physicians with a given name each only knows their patients with a given id by one name
  • (we don't know that it's false that) two different physicians could identify two different patients by the same id
  • likely that each physician has a unique name
  • likely that each physician identifies every one of their patients by an id
  • likely that a physician identifies just one patient via a given id
  • likely that a physician identifies a patient via only one id
  • likely that "identifies" always means a 1:1 correspondence of entities & ids
  • likely that each patient has only one name
  • etc

You need to know whether it's value vs variable, and it's pointless to argue about a variable and constraints (including FDs) until you agree on the predicate and the BRs (Business rules).


PS Re BRs, predicates & constraints:

A proposition is a statement about a situation: "a physician named 'Scholl, F.' tends a patient they identify as 99999 and know by name 'Gore, Z.'". A predicate is a statement template mapping from a row of column names & values to a proposition: "a physician named PhysName tends a patient they identify as PatId and know by name PatName ". A table variable holds the rows that form true propositions in a situation.

BRs ( business rules ) give variable predicates and characterize what situations can arise. Hence what table variable values can arise, hence what FDs hold, hence the CKs, etc.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM