简体   繁体   中英

Left join with right table top 1 values

I am joining an account master table with approximately 4MM rows with a transaction table. My problem is that when I do a left join of the account number from the transaction table = account number from the account master table, I am uncovering an anomaly in our data. I can have 3 different entries in the account master for the same account number. These relate to characteristics of the account. The anomaly is that while the address information may be the same, in some cases I am seeing the spelling of the city being different. When I join the two tables I only want the first instance of the account number in the account master. I have seen some posts on using the row_number() but I am lost on using it properly here. This is what I am using but getting three records for each of the account numbers.

     select am.[Customer_Name], am.[svc_city], sr.measure
from [dbo].[PP_SUMMARY_RESIDENTIAL] sr
left join [CIS].[dbo].[Account_Master] am on
(case when (left(sr.fred_account_number,2) = '00') then (right(sr.fred_account_number,len(sr.fred_account_number - 2)))
     when (left(sr.fred_account_number,1) = '0') then (right(sr.fred_account_number,len(sr.fred_account_number - 1)))
     else sr.fred_account_number
     end)
 = (select am.accountnumber, row_number() over (order by am.accountnumber) as row) where row = 1
 and sr.fred_account_number = '123456789' 

First of all, if there are several records for the same account then the DB schema and/or the applications that use it are in need of refurbishment.

Anyway, to select only one record of several "analogous" you can do something along the lines of (simplified from your query)

with
acc_with_ord as ( 
    select
        col1, col2,..., 
        row_number() over (partition by <uniquely identifying columns> order by <some columns>) as ord
    from
        AccountMaster
),
unq_acc as (
    select * from acc_with_ord where ord = 1

)
select <something>
from
    pp_summary_residential
    left join unq_acc on
        <join conditions>

The first part assigns surrogate order ids to the records describing the same account (since we partition by some fields that uniquely identify the account), the second one selects only one record per account, and the third one is the final selects that uses the unique account records in the join.

I would suggest using outer apply :

select am.[Customer_Name], am.[svc_city], sr.measure
from [dbo].[PP_SUMMARY_RESIDENTIAL] sr outer apply
     (select top 1 am.*
      from [CIS].[dbo].[Account_Master] am 
      where (case when (left(sr.fred_account_number, 2) = '00') then (right(sr.fred_account_number,len(sr.fred_account_number - 2)))
                  when (left(sr.fred_account_number,1) = '0') then (right(sr.fred_account_number, len(sr.fred_account_number - 1)))
                  else sr.fred_account_number
             end)
      order by am.account_number
     ) am;

This will select one row from am , which one depends on the order by .

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM