简体   繁体   中英

Count number of matching values from pandas groupby

I have created a pandas dataframe for a store

I have columns Transaction and Item_Type

import pandas as pd
data = {'Transaction':[1, 2, 2, 2, 3], 'Item_Type':['Food', 'Drink', 'Food', 'Drink', 'Food']}
df = pd.DataFrame(data, columns=['Transaction', 'Item_Type'])
Transaction Item_Type
1           Food
2           Drink
2           Food
2           Drink
3           Food

I am trying to group by transaction and count the number of drinks per transaction, but cannot find the right syntax to do it.

df = df.groupby(['Transaction','Item_Type']).size()

This sort of works, but gives me a multi-index Series, which I cannot yet figure out how to select drinks per transaction from it.

1/Food   1
2/Drink  2
2/Food   1
3/Food   1

This seems clunky - is there a better way?

This stackoverflow seemed most similar Adding a 'count' column to the result of a groupby in pandas?

Another way possible with pivot_table :

s = df.pivot_table(index='Transaction',
                   columns='Item_Type',aggfunc=len).stack().astype(int)

Or:

s = df.pivot_table(index=['Transaction','Item_Type'],aggfunc=len) #@thanks @Ch3steR
s.index = s.index.map("{0[0]}/{0[1]}".format)

print(s)

1/Food     1
2/Drink    2
2/Food     1
3/Food     1

Or if you wish to filter a particular category:

to_filter = 'Drink'
(df.pivot_table(index='Transaction',columns='Item_Type',aggfunc=len,fill_value=0)
                                                  .filter(items=[to_filter]))

 Item_Type    Drink
Transaction       
1                0
2                2
3                0
​

Edit: replacing original xs approach with unstack after seeing anky's answer.

>>> df.groupby('Transaction')['Item_Type'].value_counts().unstack(fill_value=0)['Drink']
Transaction
1    0
2    2
3    0
Name: Drink, dtype: int64

With a particular condition, you can sum the Boolean Series, within group, after you check the condition.

df['Item_Type'].eq('Drink').groupby(df['Transaction']).sum()

#Transaction
#1    0.0
#2    2.0
#3    0.0
#Name: Item_Type, dtype: float64

I found a solution I think

Get statistics for each group (such as count, mean, etc) using pandas GroupBy?

df = df.groupby(['Transaction','Item_Type']).size().reset_index(name='counts')

Gives me the information I need

Transaction Item_Type counts
1           Food      1
2           Drink     2
2           Food      1
3           Food      1

You may use agg and value_counts

s = df.astype(str).agg('/'.join, axis=1).value_counts(sort=False)

Out[61]:
3/Food     1
2/Drink    2
1/Food     1
2/Food     1
dtype: int64

If you want to keep the original order, chain additional sort_index

s = df.astype(str).agg('/'.join, axis=1).value_counts().sort_index(kind='mergesort')

Out[62]:
1/Food     1
2/Drink    2
2/Food     1
3/Food     1
dtype: int64

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM