简体   繁体   中英

How to not SELECT rows where certain columns are the same and one column is different?

This seems like a simple thing I'm surprised I haven't done before, but I basically want to remove duplicates based on a few different columns, but only when a particular column is different. I have the option to do this either in SQL or pandas, though SQL would be preferable. So given the following query:

SELECT fname, lname, order_date, product_id
FROM T_ORDERS

I want to remove any orders where fname, lname, and product_id are the same AND order_date is different keeping the row where the order_date is later. Is there an easy way to do this in SQL?

If I must do it python/pandas or it would be much easier, I can do that as well.

One method uses not exists :

SELECT fname, lname, order_date, product_id
FROM T_ORDERS o
WHERE NOT EXISTS (SELECT 1
                  FROM T_ORDERS o2
                  WHERE o2.fname = o.fname AND o2.lname = o.lname AND
                        o2.product_id = o.product_id AND
                        o2.order_date > o.order_date
                 );

That is, select orders where there is no larger date (for the three columns).

It's not that easy with SQL AFAIK. You need to do an implicit join one way or another.

For pandas, it's drop_duplicates :

(df.sort_values('order_date', ascending=False)
   .drop_duplicates(['fname', 'lname', 'product_id'])
)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM