简体   繁体   中英

How do I prevent duplicates from a SQL join?

I have the following tables:

customer :

id customer_number company firstname lastname account_manager_email email comments terms tax_id_number lead_source default_catalog credit_limit
99453 C00123456 Serenity Inc. Malcom Reynolds jim.smith@example.com mal@example.com The cap'n 1 NULL NULL 12345 NULL
99468 C00123456 Serenity Inc. Zoe Washburne jim.smith@example.com zoe@example.com NULL 1 NULL NULL NULL NULL
99960 C00123456 Serenity Inc. Hoban Washburne jim.smith@example.com wash@example.com NULL 1 NULL NULL NULL NULL
100088 C00123456 Serenity Inc. Inara Serra jim.smith@example.com inara@example.com NULL 1 NULL NULL 12345 NULL

customer_address :

id company street city state_abbreviation postcode telephone firstname lastname created_at
133996 Serenity, Inc 123 Any St. Anytown AX 12345 123-456-7890 Malcom Reynolds 2017-05-08 12:45:53.000
134452 Serenity, Inc 123 Any St. Anytown AX 12345 123-456-7890 Accounts Payable 2017-05-09 10:19:59.000
134961 Serenity, Inc 123 Any St. Anytown AX 12345 123-456-7890 REF 987654321 2017-05-09 10:19:59.000
134962 Serenity, Inc 123 Any St. Anytown AX 12345 123-456-7890 REF 192837465 2017-05-09 10:19:59.000
133995 Serenity, Inc 123 Any St. Anytown AX 12345 123-456-7890 Accounts Payable 2017-05-09 10:19:59.000
133669 Serenity, Inc 123 Any St. Anytown AX 12345 123-456-7890 REF 123456789 2017-05-18 10:29:42.000
133667 Serenity, Inc 123 Any St. Anytown AX 12345 123-456-7890 Accounts Payable 2017-05-18 07:56:45.000
133666 Serenity, Inc 123 Any St. Anytown AX 12345 123-456-7890 Accounts Payable 2017-05-31 07:56:46.000
133626 Serenity, Inc 123 Any St. Anytown AX 12345 123-456-7890 Accounts Payable 2017-06-16 12:45:08.000
133668 Serenity, Inc 123 Any St. Anytown AX 12345 123-456-7890 REF PO 2017-06-16 12:45:08.000

I'm running this query to generate a CSV that I can use to import the data into another system:

     SELECT '"' + CAST(c.customer_number AS VARCHAR) + '"' AS 'Customer Number',
            '"' + CAST(c.company AS VARCHAR) + '"' AS 'Company Name',
            '"' + CAST(a.street AS VARCHAR) + '"' AS 'Company Address Line 1',
            '"' + CAST(a.city AS VARCHAR) + '"' AS 'Company City',
            '"' + CAST(a.state_abbreviation AS VARCHAR) + '"' AS 'Company State',
            '"' + CAST(a.postcode AS VARCHAR) + '"' AS 'Company Zip Code',
            '"' + CAST(a.telephone AS VARCHAR) + '"' AS 'Company Phone',
            '"' + CAST(c.firstname AS VARCHAR) + '"' AS 'Contact First Name',
            '"' + CAST(c.lastname AS VARCHAR) + '"' AS 'Contact Last Name',
            '"' + CAST(c.account_manager_email AS VARCHAR) + '"' AS 'Account Manager Email',
            '"' + CAST(a.company AS VARCHAR) + '"' AS 'Contact Company Name',
            '"' + CAST(a.street AS VARCHAR) + '"' AS 'Contact Address Line 1',
            '"' + CAST(a.city AS VARCHAR) + '"' AS 'Contact City',
            '"' + CAST(a.state_abbreviation AS VARCHAR) + '"' AS 'Contact State',
            '"' + CAST(a.postcode AS VARCHAR) + '"' AS 'Contact Zip Code',
            '"' + CAST(a.telephone AS VARCHAR) + '"' AS 'Contact Phone',
            '"' + CAST(c.email AS VARCHAR) + '"' AS 'Contact Email',
            '"' + CAST(c.comments AS VARCHAR) + '"' AS 'Internal Notes',
            '"' + CAST(c.terms AS VARCHAR) + '"' AS 'Terms',
            '"' + CAST(c.tax_id_number AS VARCHAR) + '"' AS 'Tax ID (US)',
            '"' + CAST(c.lead_source AS VARCHAR) + '"' AS 'Lead Source',
            '"' + CAST(c.default_catalog AS VARCHAR) + '"' AS 'Catalog',
            '"' + CAST(c.credit_limit AS VARCHAR) + '"' AS 'Credit Limit'
       FROM customer c,
            customer_address a
      WHERE c.customer_number = 'C00123456'
        AND a.company = c.company
   ORDER BY c.customer_number,
            c.created_at;

However, when I run that query I get 40 rows back, ten for each entry in customer . I've tried different join types but the results are the same.

A lot of this is legacy data and so the only thing that I appear to reliably be able to join on is the company name ("Serenity, Inc.")

I actually need two versions of this output. The first is a single row for each company containing the entry with the oldest created_at value from the customer table. The second are all the other records.

Note: This is on SQL Server 2005 (I know... an upgrade is planned but I have to get through this first)

To remove your duplicates you need to enumerate each row and assign a value based on an ordering criteria.

You could do this easily using a cte - I believe they are available in SQL Server 2005, I am of course unable to check.

with c as(
  select *, Row_Number() over(partition by customer_number order by id) rn
  from customer
),
ca as (
  select *, Row_Number() over(partition by company order by created_at) rn
  from customer_address
)
select <columns>
from c join ca on c.company=ca.company
where c.rn=1 and ca.rn=1 and c.customer_number='C00123456'

What other joins did you try exactly?

Here's one way of selecting the most recent customer_address record for each customer record:

SELECT c.*, a.* FROM
customer c
LEFT JOIN 
  ( SELECT x.* FROM customer_address x
    INNER JOIN 
    (SELECT company, MAX(created_at) AS created_at FROM customer_address 
       GROUP BY company) u
    ON u.company=x.company AND u.created_at=x.created_at
  ) a
ON a.company=c.company
WHERE c.customer_number = 'C00123456';

and all the other customer_address records would therefore be given by:

  ( SELECT x.* FROM customer_address x
    INNER JOIN 
    (SELECT company, MAX(created_at) AS created_at FROM customer_address 
       GROUP BY company) u
    ON u.company=x.company AND NOT u.created_at=x.created_at
  ) a

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2025 STACKOOM.COM