How to not SELECT rows where certain columns are the same and one column is different?

Question

This seems like a simple thing I'm surprised I haven't done before, but I basically want to remove duplicates based on a few different columns, but only when a particular column is different. I have the option to do this either in SQL or pandas, though SQL would be preferable. So given the following query:

SELECT fname, lname, order_date, product_id
FROM T_ORDERS

I want to remove any orders where fname, lname, and product_id are the same AND order_date is different keeping the row where the order_date is later. Is there an easy way to do this in SQL?

If I must do it python/pandas or it would be much easier, I can do that as well.

Answer 1

One method uses not exists :

SELECT fname, lname, order_date, product_id
FROM T_ORDERS o
WHERE NOT EXISTS (SELECT 1
                  FROM T_ORDERS o2
                  WHERE o2.fname = o.fname AND o2.lname = o.lname AND
                        o2.product_id = o.product_id AND
                        o2.order_date > o.order_date
                 );

That is, select orders where there is no larger date (for the three columns).

Answer 2

It's not that easy with SQL AFAIK. You need to do an implicit join one way or another.

For pandas, it's drop_duplicates :

(df.sort_values('order_date', ascending=False)
   .drop_duplicates(['fname', 'lname', 'product_id'])
)

How to not SELECT rows where certain columns are the same and one column is different?

Question

2 answers

solution1
1 2021-03-01 16:31:18

solution2
1 2021-03-01 16:32:24

How to not SELECT rows where certain columns are the same and one column is different?

Question

2 answers

solution1 1 2021-03-01 16:31:18

solution2 1 2021-03-01 16:32:24

solution1
1 2021-03-01 16:31:18

solution2
1 2021-03-01 16:32:24