简体   繁体   中英

Count the total of a grouped by with pandas

details = { 
    'order_number' : ['#1', '#2', '#3', '#4','#4'], 
    'disc_code' : ['no_discount', 'superman', 'hero', 'numero_uno','numero_uno'], 
    }
df = pd.DataFrame(details)

len(df) --> 6408
Each row attributes to one product, instead of one transaction. If I group every row to each order name, there are 3560 rows. len(df.groupby('order_number')) --> 3560

I want to count how many discount codes are used in total . (if no discount code is used, the value is 'no_discount')

In SQL, the syntax probably looks like this:

SELECT COUNT(*)
FROM transactions
GROUP BY order_number
WHERE discount_code != 'no_discount' 

Use boolean indexing with GroupBy.size if need count per order_number :

df1 = (df[df['disc_code'].ne('no_discount')]
           .groupby('order_number')
           .size()
           .reset_index(name='count'))
print (df1)
  order_number  count
0           #2      1
1           #3      1
2           #4      2

If need count all values only count True s values by condition for not equal by Series.ne with sum :

out = df['disc_code'].ne('no_discount').sum()

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM