简体   繁体   中英

MySQL: Replace all instances of specific foreign key with new value

I have a MySQL database with 1000s of personnel records, often with duplicates.

For each case with at least one duplicate I want to be able to delete all of the duplicates but one, then update any references to those deleted foreign keys with the one I did not.

For example, we see two instances of Star Lord below:

+-----------------------+
|        `users`        |
+------+----------------+
| id   | name           |
+------+----------------+
| 1    | Star Lord      |
+------+----------------+
| 2    | Star Lord      |
+------+----------------+
| 3    | Iron Man       |
+------+-----+----------+

+-----------------------+
|       `messages`      |
+------+-----+----------+
| from | to  | text     |
+------+-----+----------+
| 1    | 5   | hi       |
+------+-----+----------+
| 2    | 5   | how r u  |
+------+-----+----------+
| 5    | 2   | Good, u? |
+------+-----+----------+

Those two tables should become:

+-----------------------+
|        `users`        |
+------+----------------+
| id   | name           |
+------+----------------+
| 1    | Star Lord      |
+------+----------------+
| 3    | Iron Man       |
+------+-----+----------+

+-----------------------+
|       `messages`      |
+------+-----+----------+
| from | to  | text     |
+------+-----+----------+
| 1    | 5   | hi       |
+------+-----+----------+
| 1    | 5   | how r u  |
+------+-----+----------+
| 5    | 1   | Good, u? |
+------+-----+----------+

Can this be done? I'm happy to use PHP as needed.

I found the following, but it's only for finding foreign key usage, not replacing instances for specific key values: MySQL: How to I find all tables that have foreign keys that reference particular table.column AND have values for those foreign keys?

Bonus Points

There may be additional data which needs to be merged in the users table. For example, Star Lord with ID #1 might have a phone field filled in, but Star Lord with ID #2 has an email field.

Worst case: they both have a field, with conflicting data.

I suggest:

  1. Create a table of correct data. A good starting point might be:

     CREATE TABLE users_new LIKE users; ALTER TABLE users_new ADD UNIQUE (name); INSERT INTO users_new (id, name, phone, email) SELECT MIN(id), name, GROUP_CONCAT(phone), GROUP_CONCAT(email) FROM users GROUP BY name; 

    Note that, due to your "worst case" observation under "Bonus Points", you may well want to manually verify the contents of this table before archiving the underlying users data (I advise against permanent deletion, just in case).

  2. Update existing foreign relationships:

     UPDATE messages JOIN (users uf JOIN users_new unf USING (name)) ON uf.id = messages.from JOIN (users ut JOIN users_new unt USING (name)) ON ut.id = messages.to SET messages.from = unf.id, messages.to = unt.id 

    If you have a lot of tables to update, you could cache the results of the join between users and users_new —either:

    • in a new_id column within the old users table:

       ALTER TABLE users ADD new_id BIGINT UNSIGNED; UPDATE users JOIN users_new USING (name) SET users.new_id = users_new.id; UPDATE messages JOIN users uf ON uf.id = messages.from JOIN users ut ON ut.id = messages.to SET messages.from = uf.new_id, messages.to = ut.new_id; 
    • or else in a new (temporary) table:

       CREATE TEMPORARY TABLE newid_cache ( PRIMARY KEY(old_id), KEY(old_id, new_id) ) ENGINE=MEMORY SELECT users.id AS old_id, users_new.id AS new_id FROM users JOIN users_new USING (name); UPDATE messages JOIN newid_cache nf ON nf.old_id = messages.from JOIN newid_cache nt ON nt.old_id = messages.to SET messages.from = nf.new_id, messages.to = nt.new_id; 
  3. Either replace users with users_new , or else modify your application to use the new table in place of the old one.

     ALTER TABLE users RENAME TO users_old; ALTER TABLE users_new RENAME TO users; 
  4. Update any foreign key constraints as appropriate.

I like to be really methodical about this, while you could write it all in one complex query, that's an optimisation, and unless it's obvious, an unnecessary one.

First backup your database :) Create a table to hold the ids of the users you are going to keep.

Fill it with say

Insert into Keepers Select keep_id From (Select Min(id) as keep_id,`name` From `users`)

After that it's just some update with joins.

eg

UPDATE 
   `messages` m JOIN
   keepers k 
      ON k.keeper_id = m.from 
SET m.from = k.keeper_id

UPDATE 
   `messages` m JOIN
   keepers k 
      ON k.keeper_id = m.to 
SET m.to = k.keeper_id

Then get rid of the users you don't want

Delete `users`
from `users` u
outer join keepers on k.keeper_id = u.id
where i.id is null

When all is good eg you have the same number of messages as you started with, no one is talking to themselves etc.

Delete the keepers table.

Syntax not checked, but it should be close.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM