简体   繁体   English

查找重复项,然后使用主表中的ID更新表,然后删除表中的记录

[英]Find duplicates, then update a table with the id from the main table and then delete records in a table

My Issue is as follows I have two tables the personaldata table consist of records of employees, whoisonboard table consists of records when an employee been onboard. 我的问题如下,我有两个表,personaldata表由员工记录组成,whoisonboard表由员工入职时的记录组成。 We have got duplicate in the personaldata table and these different ids are also stored in the whoisonboard table when people has been checked in. No problems to find the duplicates. 个人数据表中有重复项,签到人员后,这些不同的ID也存储在whoisonboard表中。查找重复项没有问题。

Delete all data in the personal table that do not exist in whoisonboard table) DELETE FROM personaldata WHERE id NOT IN (SELECT personid FROM whoisonboard) 删除个人表中不存在于whoisonboard表中的所有数据)从个人数据中删除ID不在的个人数据(从whoisonboard中选择个人ID)

This will delete any person who has not been on any ships, as there would not be a record in whoisonboard table. 这将删除未曾在任何船只上的任何人,因为whoisonboard表中将没有记录。

We delete any records in whoisonboard that do not have a corresponding record in personaldata - this is to make sure there are no orphant whoisonboard records 我们会删除Whoisonboard中任何在个人数据中没有相应记录的记录-这是为了确保没有孤立的Whoisonboard记录

  DELETE FROM whoisonboard WHERE personid NOT IN (SELECT id FROM personaldata)

We can find all the duplicates in the personaldata table and give the whoisonboard, to identify duplicates the query looks for the field names, date_of_birth and nationality is the same. 我们可以在个人数据表中找到所有重复项,并提供whoisonboard,以识别重复项,查询查找的字段名称,date_of_birth和国籍是相同的。

 select a.id as personid, b.id as whoisid, b.personid whoispersonid, a.names, a.date_of_birth, a.nationality 
 from personaldata a
 join whoisonboard b on a.id = b.personid 
   where  (a.names, a.date_of_birth, a.nationality) in (
     select a.names, a.date_of_birth, a.nationality
      from personaldata a
      group  by a.names, a.date_of_birth, a.nationality
      having count(distinct a.id) > 1
    )
  order by date_of_birth desc

We can then issue this SQL statement to update the records and later delete the orphan records of the duplicates, if we have a lot duplicates it can be time-consuming to do this. 然后,我们可以发出此SQL语句来更新记录,并在以后删除重复项的孤立记录,如果我们有很多重复项,这样做可能会很耗时。

UPDATE whoisonboard SET personid = '74777a8e-343c-11e9-a2bb-000c2912dae9' 
WHERE `id` LIKE '5bd2c268-ec4d-11e8-ab89-000c29045ceb'

Then at the end, I would just delete the orphans records with 然后最后,我将删除带有

DELETE FROM personaldata WHERE id NOT IN (SELECT personid FROM whoisonboard) 从不在ID中的个人数据删除(从Whoisonboard中选择个人ID)

I have been trying to build a SQL statement that could do the update in one go, it fails 我一直在尝试构建一个可以一次性完成更新的SQL语句,但失败

 update whoisonboard set personid = final_id 
 from whoisonboard 
 join personaldata on personaldata.id = whoisonboard.personid 
 join ( select names, date_of_birth, nationality, min(id) as final_id from 
 personaldata group by names, date_of_birth, nationality ) min_ids on 
 min_ids.names = personaldata.names

I get an error when executing, I wonder if what I trying to do is possible in one sql statement, the thing is that as we try to avoid duplicates they do happen and it would be good to have a simple way to refresh the database. 执行时出现错误,我想知道我尝试执行的操作是否在一个sql语句中,问题是当我们尝试避免重复发生时,最好有一种简单的方法来刷新数据库。

I just did this to correct a similar problem in my data warehouse. 我只是这样做是为了纠正我的数据仓库中的类似问题。

I'm including much pseudocode because this is lengthy and I don't want to bother testing it for your case. 我包含了很多伪代码,因为这很冗长,而且我不想为您的情况测试它。 Also, mine was for SQL Server, so the code probably wouldn't work for you. 另外,我的是用于SQL Server的,因此该代码可能对您不起作用。 So here is the concept... 所以这是概念...

Create a temp table to store all natural key code combinations and the ids (many ids per natural key). 创建一个临时表来存储所有自然键代码组合和ID(每个自然键有许多ID)。

create table #p (id [auto_increment], personkey, personid)
insert #p select lastname + ',' + firstname, personid 
from personaldata 
order by 1

Create a temp table to store the minimum id for each natural key value (one id per natural key). 创建一个临时表来存储每个自然键值的最小ID(每个自然键一个ID)。

create table #pmin (id [auto_increment], personkey, personid)
insert #pmin
select personkey, min(personid) as personid
from #p
group by personkey
order by 1

Loop through the records of #pmin, update whoisonboard, and tidy persondata. 循环浏览#pmin的记录,更新Whoisonboard和整洁的persondata。

declare variables
initialize variables

loop through #pmin from id = 1 to [max]
begin loop
    increment counter
    store the values of personkey and personid for this iteration
        select @thisVal = personkey, @idMin = personid from #pmin where id = @i
    store all values of personid for this personkey from #p (I used a table variable @a)
        insert @a select personid from #p where personkey = @thisVal
    update whoisonboard set personid = min personid for all values of personid
        update whoisonboard set personid = @idMin where personid in (select personid from @a)
    delete all but the first persondata record for this iteration
        delete persondata where personid in (select personid from @a where personid <> @idMin)
end loop

My code also included some other steps that I needed to perform for my case, as well as a lot of testing/data comparison code to verify I did the right thing at each step. 我的代码还包括我需要针对我的案例执行的其他一些步骤,以及许多测试/数据比较代码,以验证我在每个步骤中所做的事情是否正确。

  • Report dates, pay, or whatever for each person before and after. 报告每个人之前和之后的日期,工资或其他信息。 They should match exactly. 它们应该完全匹配。
  • Verify you got the first and last record. 确认您获得了第一条记录和最后一条记录。
  • other checks as you see fit 您认为合适的其他检查

Altogether, my code was about 600 lines. 一共,我的代码大约有600行。 (That's why I didn't want to go to that extent here.) But what I have provided here should be a sufficient outline to accomplish your task. (这就是为什么我不想在这里达到这个程度。)但是我在这里提供的内容应该足以完成您的任务。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM