简体   繁体   中英

Distinct Households Comparison Using Join

I am trying to compare two lists of unique Household IDs using the Distinct clause. The problem comes when I try to pull in a third column consisting of timestamps into the results.

When I include only the two Household ID columns in the Select statement, the results seem to make sense. I get back two lists of unique IDs.

Here is that query:

select distinct e.household_id, a.hhid
FROM [dbo].[exposure] e
left outer join [dbo].[audience] a
on e.household_id = a.hhid

However, when I just add the "e.imp_ts" column to the Select statement, it looks like SQL completely disregards the Distinct part of the query and pulls in all the duplicate households in the files.

select distinct e.household_id, a.hhid, e.imp_ts
FROM [dbo].[exposure] e
left outer join [dbo].[audience] a
on e.household_id = a.hhid

Can someone please explain why the query doesn't work when I simply add a third column to the Select statement?

Thank you!

It is not that the second query "doesn't work", but rather that it is being asked to provide different results than the first query. As others in the comments have pointed out, because the imp_ts column contains more granular data, the distinct can no longer return a unique list of household IDs. For example, household ID 12345 may contain 5 records, each with unique timestamps on them.

In order to resolve this, you have some choices:

  1. Remove imp_ts from the query.
  2. Return the minimum (most likely first) timestamp
  3. Return the maximum (most likely last) timestamp

For #2 and #3 above, you can use MIN() or MAX() with a GROUP BY to achieve those results. Here is an example of using MIN() :

select e.household_id, a.hhid, MIN(e.imp_ts) AS min_imp_ts
FROM [dbo].[exposure] e
left outer join [dbo].[audience] a
on e.household_id = a.hhid
group by e.household_id, a.hhid

I would suggest looking up GROUP BY examples online to get a better idea of what is happening.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM