简体   繁体   English

合并多个匹配条件

[英]Merge with multiple matching conditions

I have to write a t-sql merge statement where I have to meet multiple conditions to match. 我必须编写一个t-sql merge语句,其中必须满足多个条件才能进行匹配。

Table column names: ID, emailaddress, firstname, surname, titile, mobile, dob, accountnumber, address, postcode 表格栏名称:ID,电子邮件地址,名字,姓氏,标题,移动电话,身份证,帐号,地址,邮政编码

The main problem here is that, the database I am working with does not have mandatory fields, there is no primary keys to compare, and source table can have duplicates records as well. 这里的主要问题是,我正在使用的数据库没有必填字段,没有要比较的主键,并且源表也可以具有重复记录。 As a result, there are many combination to check for the duplicates of source table against the target table. 结果,有许多组合可以检查源表是否与目标表重复。 My manager have come up with following scenario 我的经理提出了以下方案

  1. We could have data where two people using same email address so emailaddress, firstname and surname match is 100% match (thinking all other columns else are empty) 我们可能有两个人使用相同电子邮件地址的数据,因此,电子邮件地址,名字和姓氏匹配为100%匹配(认为其他所有列均为空)

  2. data where mobile and accountnumber match is 100% match (thinking all other columns else are empty) mobile和accountnumber匹配为100%匹配的数据(认为其他所有列均为空)

  3. title, surname, postcode, dob match is 100% match (thinking all other columns else are empty) 标题,姓氏,邮政编码,Dob匹配为100%匹配(认为其他所有列均为空)

I was given this task where I cannot see the data because I am a new recruit and my employee does not want to me to see this data for the moment. 在执行此任务时,我无法查看数据,因为我是一名新员工,而我的员工暂时不希望我看到此数据。 So, I am kind of working with my imagination. 所以,我有点想像力。

The solution Now, I am thinking rather than checking the existing record of source against target database, I will cleanse the source data using stored procedure statements, where if it meets one duplicate condition then it will skip the next duplicate removing statements and insert the data into target table. 解决方案现在,我正在考虑,而不是对照目标数据库检查源的现有记录,而是使用存储过程语句来清理源数据,如果满足一个重复条件,它将跳过下一个重复的删除语句并插入数据进入目标表。

with cte_duplicate1 AS
    (
        select emailaddress, sname, ROW_NUMBER() over(partition by emailaddress, sname order by emailaddress) as dup1
        from DuplicateRecordTable1
    )
    delete from cte_duplicate1
    where dup1>1;

(if the first cte_duplicate1 code was executed then it will skip the cte_duplicate2) (如果执行了第一个cte_duplicate1代码,则它将跳过cte_duplicate2)

with cte_duplicate2 AS
    (
        select emailaddress, fname, ROW_NUMBER() over(partition by emailaddress, fname order by emailaddress) as dup2
        from DuplicateRecordTable1
    )
delete from cte_duplicate2
where dup2>1;

That is the vague plan at the moment. 目前,这是一个模糊的计划。 I do not know yet, if it achievable or not. 我不知道,是否可以实现。

I have given a job where I cannot see the data because I am new recruit and my employee does not want to me to give me data to work with. 我做了一份无法查看数据的工作,因为我是新入职员工,而我的员工不希望我提供与我合作的数据。 So, I am kind of working with my imagination. 所以,我有点想像力。

Anyway, the main problem here is that, the database I am working with does not have mandatory fields, there is no primary keys to compare, and source table can have duplicates records as well. 无论如何,这里的主要问题是,我正在使用的数据库没有必填字段,没有要比较的主键,并且源表也可以有重复记录。 As a result, there are many combination to check for the duplicates of source table against the target table. 结果,有许多组合可以检查源表是否与目标表重复。

The solution Now, I am thinking rather than checking the existing record of source against target database, I will cleanse the source data using stored procedure statements, where if it meets one duplicate condition then it will skip the next duplicate removing statements and insert the data into target table. 解决方案现在,我正在考虑,而不是对照目标数据库检查源的现有记录,而是使用存储过程语句来清理源数据,如果满足一个重复条件,它将跳过下一个重复的删除语句并插入数据进入目标表。

with cte_duplicate1 AS
    (
        select emailaddress, sname, ROW_NUMBER() over(partition by emailaddress, sname order by emailaddress) as dup1
        from DuplicateRecordTable1
    )
    delete from cte_duplicate1
    where dup1>1;

(if the first cte_duplicate1 code was executed then it will skip the cte_duplicate2) (如果执行了第一个cte_duplicate1代码,则它将跳过cte_duplicate2)

with cte_duplicate2 AS
    (
        select emailaddress, fname, ROW_NUMBER() over(partition by emailaddress, fname order by emailaddress) as dup2
        from DuplicateRecordTable1
    )
delete from cte_duplicate2
where dup2>1;

That is the vague plan at the moment. 目前,这是一个模糊的计划。 I do not know yet, if it achievable or not. 我不知道,是否可以实现。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM