简体   繁体   中英

How to return distinct records based on two different values

I have the following sample of data in an Oracle table called MY_ATTRIBUTES:

Date        Emp_No  Attribute
-----------------------------------------------------
01/04/2012  1234567 APPLE
01/04/2012  1234567 ORANGE
01/04/2012  1234567 PINEAPPLE
01/04/2012  1234567 BANANA
01/04/2012  8888888 APPLE
01/04/2012  8888888 ORANGE
01/04/2012  2222222 APPLE
01/04/2012  2222222 ORANGE
01/04/2012  2222222 PINEAPPLE

Based on the above sample data, I need to return only the distinct records using only Date and Emp_No where these records contains data that have both the Attributes of 'APPLE' and 'PINEAPPLE'

So based on this criteria, I would expect the following two result records only, ie:

01/04/2012  1234567
01/04/2012  2222222

I am looking to write an Oracle SQL query that will return this result set based on the criteria described.

I should also point out that I need this SQL query to be used as a sub-query off the main SELECT.

try

SELECT DISTINCT A.Date, A.Emp_no 
FROM (SELECT * FROM MY_ATTRIBUTES WHERE ATTRIBUTE = 'APPLE') A 
INNER JOIN (SELECT * FROM MY_ATTRIBUTES WHERE ATTRIBUTE = 'PINEAPPLE' ) B ON B.Date = A.Date AND B.Emp_No = A.Emp_No 

There may be a more efficient way, but this should work:

SELECT DISTINCT DATE, EMP_NO
FROM   (
    SELECT DATE, EMP_NO
    FROM   MY_ATTRIBUTES
    WHERE  ATTRIBUTE = 'APPLE'
    INTERSECT
    SELECT DATE, EMP_NO
    FROM   MY_ATTRIBUTES
    WHERE  ATTRIBUTE = 'PINEAPPLE'
);

If there are no duplicates (ex. 2 APPLES on the same date and emp) this might be more efficient as there are no joins:

SELECT DATE, EMP_NO 
FROM MY_ATTRIBUTES
WHERE ATTRIBUTE = 'APPLE' OR  ATTRIBUTE = 'PINEAPPLE'
GROUP BY DATE, EMP_NO
HAVING COUNT(*) = 2

A variant on a theme:

with cte as 
 ( select date, empno, attribute
   from my_attributes
   where attribute in ('PINEAPPLE', 'APPLE') )
select  *
from (  select date, emp_no
        from cte
        where attribute = 'PINEAPPLE'
        intersect
        select date, emp_no
        from cte
        where attribute = 'APPLE' )

The performance of any of these solutions will depend on the selectivity of ATTRIBUTE. If you only have half a dozen fruits in your bowl you're looking at a full table scain and there are only limited options for tuning those. On the other hand if the column is a greengrocer's full of fruity delights (say 150+ distinct values) then you should get return from an index. But that still depends on distribution and skew: you might not see any benefits from an index if ATTRIBUTE is 90% apples and pineapples: in that case a full table scan is still the better option.

tl;dr

Tuning is hard

You can try the following:

SELECT DATE, EMP_NO FROM YOUR_TABLE WHERE ATTRIBUTE = 'APPLE' OR ATTRIBUTE = 'PINEAPPLE' GROUP BY DATE, EMP_NO HAVING COUNT(*) = 2;

SELECT DISTINCT DATE, EMP_NO
WHERE  ATTRIBUTE IN('APPLE','PINEAPPLE')
SELECT DISTINCT Emp_No, Date
FROM (
  SELECT *
  FROM MY_ATTRIBUTES m
  WHERE m.Attribute IN ('Apple','Pineapple')
  )
LIMIT 2
ORDER BY Emp_No

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM