简体   繁体   中英

Rolling id based on foreign key in a hierarchical schema

As an example, consider this hierarchical schema. 在此处输入图片说明

Assume all id fields are auto incrementing primary keys and that foreign keys are named by [parent_table_name]_id convention.

The problem

As soon as there are multiple companies in the database, then companies will share all primary key sequences between them.

For example, if there are two company rows, the customer_group table could look like this

| id | company_id |
-------------------
| 1  |     1      |
| 2  |     1      |
| 3  |     2      |
| 4  |     2      |
| 5  |     1      |
-------------------

But it should look like this

| id | company_id |
-------------------
| 1  |     1      |
| 2  |     1      |
| 1  |     2      |
| 2  |     2      |
| 3  |     1      |
-------------------

This behavior should also be exhibited for customer and any other table in the tree that directly or indirectly references company .

Note that I will most likely make a second id column (named something like relative_id ) for this purpose, keeping the unique id column intact, as this is really mostly for display purposes and how users will reference these data entities.


Now if this was just one level of hierarchy, it would be a relatively simple solution. I could make a table (table_name, company_id, current_id) and a trigger procedure that fires before insert on any of the tables, incrementing the current id by 1 and setting the row's relative_id to that value. It's trivial when the company_id is right there in the insert query.

But how about the tables that don't reference company directly? Like the lowest level of the hierarchy in this example, workorder , which only references customer .
Is there a clean, reusable solution to climb the ladder all the way from 'customer_id' to ultimately retrieve the parenting company_id ?

Going recursively up the hierarchy with SELECTs on each INSERT doesn't sound too appealing to me, performance wise.

I also do not like the idea of just adding a foreign key to company for each of these tables, the schema would get increasingly uglier with each additional table.

But these are the two solutions I can see, but I may not be looking in the right places.

The company shouldn't care what the primary key is if you're using generated keys. They're supposed to be meaningless; compared for equality and nothing else. I grumbled about this earlier , so I'm really glad to see you write:

Note that I will most likely make a second id column (named something like relative_id) for this purpose, keeping the unique id column intact, as this is really mostly for display purposes and how users will reference these data entities.

You're doing it right.

Most of the time it doesn't matter what the ID is, so you can just give them whatever comes out of a sequence and not care about holes/gaps. If you're concerned about inter-company leakage (unlikely) you can obfuscate the IDs by using the sequence as an input to a pseudo-random generator. See the function Daniel Verité wrote in response to my question about this a few years ago, pseudo_encrypt .

There are often specific purposes for which you need perfectly sequential gapless IDs, like invoice numbers. For those you need to use a counter table and - yes - look up the company ID. Such ID generation is slow and has terrible concurrency anyway, so an additional SELECT with a JOIN or two on indexed keys won't hurt much. Don't go recursively up the schema with SELECT s though, just use a series of JOIN s. For example, for an insert into workorder your key generation trigger on workorder would be something like the (untested):

   CREATE OR REPLACE FUNCTION workorder_id_tgfn() RETURNS trigger AS $$
   BEGIN
       IF tg_op = 'INSERT' THEN
           -- Get a new ID, locking the row so no other transaction can add a
           -- workorder until this one commits or rolls back.
           UPDATE workorder_ids
           SET next_workorder_id = next_workorder_id + 1 
           WHERE company_id = (SELECT company_id
                 FROM customer
                 INNER JOIN customer_group ON (customer.customer_group_id = customer_group.id) 
                 INNER JOIN company ON (customer_group.company_id = company.id)
               WHERE customer.id = NEW.customer_id)
           RETURNING next_workorder_id
           INTO NEW.id;
       END IF;
   END;
   $$ LANGUAGE 'plpgsql';

For the UPDATE ... RETURNING ... INTO syntax see Executing a Query with a Single-Row Result .

There can be gaps in normal sequences even if there's no multi-company problem. Observe:

CREATE TABLE demo (id serial primary key, blah text);

BEGIN;
INSERT INTO demo(blah) values ('aa');
COMMIT;

BEGIN;
INSERT INTO demo(blah) values ('bb');
ROLLBACK;

BEGIN;
INSERT INTO demo(blah) values ('aa');
COMMIT;

SELECT * FROM demo;

Result:

regress=#     SELECT * FROM demo;
 id | blah 
----+------
  1 | aa
  3 | aa

"But it should look like this"

| id | company_id |
-------------------
| 1  |     1      |
| 2  |     1      |
| 1  |     2      |
| 2  |     2      |
| 3  |     1      |
-------------------

I think it should not and I think you want a many to many relationship. The customer_group table:

| id | name |
-------------
| 1  |  n1  |
| 2  |  n2  |
| 3  |  n3  |
-------------

And then the customer_group_company table:

| group_id | company_id |
-------------------------
|    1     |     1      |
|    2     |     1      |
|    1     |     2      |
|    2     |     2      |
|    3     |     1      |
-------------------------

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM