简体   繁体   中英

How to get the distinct count in a Pandas groupby

I would like to get the distinct count of products per order_number. I managed to get the total_product count (thanks to the help of another SO user), but I can't figure out the distinct count.

This is what I have:

data['total_productcount'] = data.groupby(['order_number'])['order_number'].transform('size')

And it gives:

order_number          product_id      total_productcount   
171-1046037-0511522   4260179734731   5                    
171-1046037-0511522   4054673034394   5                   
171-1046037-0511522   4054673001235   5                   
171-1046037-0511522   4054673005752   5                    
171-1046037-0511522   5011385960075   5                    
171-1046037-0511522   5011385960075   5    

This is the dataframe, that I would like to generate (including: distict_productcount)

order_number          product_id      total_productcount   distict_productcount
171-1046037-0511522   4260179734731   5                    1
171-1046037-0511522   4054673034394   5                    1
171-1046037-0511522   4054673001235   5                    1
171-1046037-0511522   4054673005752   5                    1
171-1046037-0511522   5011385960075   5                    1
171-1046037-0511522   5011385960075   5                    2

How can I generate "distict_productcount" ?

data.groupby('order_number').product_id.nunique()

You can get a new column by either using transform or join

via transform

s = data.groupby('order_number').product_id.transform('nunique')
df = df.assign(distinct_productcount=s)

via join

s = data.groupby('order_number').product_id.nunique()
df = df.join(s.rename('distinct_productcount'), on='order_number')

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM