[英]Two tables with similar columns but different primary keys
I have two tables from two different databases, and both contain lastName
and firstName
columns. 我有两个来自两个不同数据库的表,都包含
lastName
和firstName
列。 I need to create JOIN
a relationship between the two. 我需要创建
JOIN
两者之间的关系。 The lastName
columns match about 80% of the time, while the firstName
columns match only about 20% of the time. lastName
列匹配大约80%的时间,而firstName
列仅匹配大约20%的时间。 And each table has totally different personID
primary keys. 每个表都有完全不同的
personID
主键。
Generally speaking, what would be some "best practices" and/or tips to use when I add a foreign key to one of the tables? 一般来说,当我向其中一个表中添加外键时,将使用哪些“最佳实践”和/或技巧? Since I have about 4,000 distinct persons, any labor-saving tips would be greatly appreciated.
由于我有大约4,000个不同的人员,因此不胜感激的小费将不胜感激。
Sample mismatched data: 样本不匹配的数据:
db1.table1_____________________ db2.table2_____________________
23 Williams Fritz 98 Williams Frederick
25 Wilson-Smith James 12 Smith James Wilson
26 Winston Trudy 73 Winston Gertrude
Keep in mind: sometimes they match exactly, often they don't, and sometimes two different people will have the same first/last name. 请记住:有时他们完全匹配,有时却不匹配,有时两个不同的人的姓氏/名字相同。
You can join on multiple fields. 您可以加入多个领域。
select *
from table1
inner join table2
on table1.firstName = table2.firstName
and table1.lastName = table2.lastName
From this you can determine how many 'duplicate' firstname / last name combos there are. 从中可以确定有多少个“重复的”名字/姓氏组合。
select table1.firstName, table2.lastName, count(*)
from table1
inner join table2
on table1.firstName = table2.firstName
and table1.lastName = table2.lastName
group by table1.firstName, table2.lastName
having count(*) > 1
Conversely, you can also determine the ones which match identically, and only once: 相反,您也可以确定一次完全匹配的匹配项:
select table1.firstName, table2.lastName
from table1
inner join table2
on table1.firstName = table2.firstName
and table1.lastName = table2.lastName
group by table1.firstName, table2.lastName
having count(*) = 1
And this last query could be the basis for performing the bulk of your foreign key updates. 最后的查询可能是执行大量外键更新的基础。
For those names that match more than once between the tables, they'll likely need some sort of manual intervention, unless there are other fields in the table that can be used to differentiate them? 对于那些在表之间不止一次匹配的名称,它们可能需要某种手动干预,除非表中还有其他字段可用于区分它们?
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.