简体   繁体   中英

Merge with multiple matching conditions

I have to write a t-sql merge statement where I have to meet multiple conditions to match.

Table column names: ID, emailaddress, firstname, surname, titile, mobile, dob, accountnumber, address, postcode

The main problem here is that, the database I am working with does not have mandatory fields, there is no primary keys to compare, and source table can have duplicates records as well. As a result, there are many combination to check for the duplicates of source table against the target table. My manager have come up with following scenario

  1. We could have data where two people using same email address so emailaddress, firstname and surname match is 100% match (thinking all other columns else are empty)

  2. data where mobile and accountnumber match is 100% match (thinking all other columns else are empty)

  3. title, surname, postcode, dob match is 100% match (thinking all other columns else are empty)

I was given this task where I cannot see the data because I am a new recruit and my employee does not want to me to see this data for the moment. So, I am kind of working with my imagination.

The solution Now, I am thinking rather than checking the existing record of source against target database, I will cleanse the source data using stored procedure statements, where if it meets one duplicate condition then it will skip the next duplicate removing statements and insert the data into target table.

with cte_duplicate1 AS
    (
        select emailaddress, sname, ROW_NUMBER() over(partition by emailaddress, sname order by emailaddress) as dup1
        from DuplicateRecordTable1
    )
    delete from cte_duplicate1
    where dup1>1;

(if the first cte_duplicate1 code was executed then it will skip the cte_duplicate2)

with cte_duplicate2 AS
    (
        select emailaddress, fname, ROW_NUMBER() over(partition by emailaddress, fname order by emailaddress) as dup2
        from DuplicateRecordTable1
    )
delete from cte_duplicate2
where dup2>1;

That is the vague plan at the moment. I do not know yet, if it achievable or not.

I have given a job where I cannot see the data because I am new recruit and my employee does not want to me to give me data to work with. So, I am kind of working with my imagination.

Anyway, the main problem here is that, the database I am working with does not have mandatory fields, there is no primary keys to compare, and source table can have duplicates records as well. As a result, there are many combination to check for the duplicates of source table against the target table.

The solution Now, I am thinking rather than checking the existing record of source against target database, I will cleanse the source data using stored procedure statements, where if it meets one duplicate condition then it will skip the next duplicate removing statements and insert the data into target table.

with cte_duplicate1 AS
    (
        select emailaddress, sname, ROW_NUMBER() over(partition by emailaddress, sname order by emailaddress) as dup1
        from DuplicateRecordTable1
    )
    delete from cte_duplicate1
    where dup1>1;

(if the first cte_duplicate1 code was executed then it will skip the cte_duplicate2)

with cte_duplicate2 AS
    (
        select emailaddress, fname, ROW_NUMBER() over(partition by emailaddress, fname order by emailaddress) as dup2
        from DuplicateRecordTable1
    )
delete from cte_duplicate2
where dup2>1;

That is the vague plan at the moment. I do not know yet, if it achievable or not.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM