简体   繁体   中英

Is there any drawback when tying a foreign key to a non-primary key?

Suppose we have a table of jobs like so:

  • job (id, number);

Suppose we have a customer request to track job times and we create a table like so:

  • (A) job_timer (id, job_id, timestamp);

but we have a choice of how we want to tie it to job table, so we can also:

  • (B) job_timer (id, job_number, timestamp);

Suppose that job_number is UNIQUE .

I am conditioned to make foreign keys based on id, so A) job_id would be the way I have seen it done. But, customers are looking up jobs by job number, and it will save me and database some work if I do a lookup directly by B) job_number . But should I do so?

By more work when using job_id I mean, ie given job number I only need to use job_timer table. when given job_id I need to tie in both tables - more cognitive programming work, more database work.

Note, a similar question Foreign Key to non-primary key addresses UNIQUE-ness of non-primary key but I don't believe it addressed my issue.

The original table ( job ) is part of legacy codebase where both fields id and number are utilized throughout the code extensively, and in this sense I have a "split primary key" condition. Weeding that out will be prohibitive due to lack of full test coverage. Why the number field was created and also why it was made to be a varchar are good questions. I am sure there must've been a reason at some point.

I'm looking for something like "yes, go ahead it's totally fine, there is no best practice in this case", or "no, if you do this you might come across issues X, Y, Z in the future".

TL;DR Always declare a FOREIGN KEY constraint (chain) when values for a list of columns must appear as values for another list of PRIMARY KEY or UNIQUE NOT NULL columns. But choosing which CK (candidate key) to reference as a FK (foreign key) when there are multiple CKs is ultimately pragmatic. The criteria are essentially those for choosing a PK (primary key) since distinguishing a CK as PK is ultimately for preferred use in FKs. A typical list is familiarity, irreducibility, stability & simplicity . Here past use suggests that either CK is reasonable. Although considering number to be only for final output explains its varcharness and its uniqueness despite the presence & uniqueness of id . If you ever include both then be aware that it might be appropriate to declare FOREIGN KEY on the pair. (Requiring adding UNIQUE NOT NULL on the pair in job .)


A superkey is a set of columns that are unique not null. A CK is a superkey that contains no smaller superkey. A table can have any number of CKs. A PK is a distinguished CK.

We could say that a "foreign superkey" holds when the values for a subrow in a table are also values for some superkey subrow in a referenced table. If the superkey is a CK then the foreign superkey is a FK. We tell the DBMS about CKs and foreign superkeys so that it can prevent invalid database states.

An SQL UNIQUE NOT NULL actually declares a superkey. So SQL PRIMARY KEY actually declares a distinguished superkey. It is a PK if the superkey is a CK. An SQL FOREIGN KEY actually declares a foreign superkey. It is a FK if the referenced superkey is a CK.

Your table with "split PK" is just a table with two CKs. (That form a superkey because all supersets of CKs are superkeys.) As far as constraint declarations are concerned, primacy is irrelevant. You should just declare the constraints that hold so that the DBMS can enforce them.

Be aware that if you have a table with id and number as FKs then it is likely that pairs of values must appear in job . If so then declare the pair as a foreign superkey via FOREIGN KEY . This need to add foreign superkeys is a disadvantage of having surrogate keys when there are natural keys. On the other hand this arises whenever there are multiple CKs.

PS Any superset of a unique column set is unique. But SQL requires you to declare the target of a FK as UNIQUE NOT NULL even if it must already be unique by containing some smaller set declared unique. So when there is an id - number pair where the pair has to appear in job you should declare the compound FK and the compound superkey. PPS The point of all these declarations is integrity, not indexing for optimization. (Although that's important too.)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM