I would like to count the number of matches after a groupby in a pandas dataframe.
claim event material1 material2
A X M1 M2
A X M2 M3
A X M3 M0
A X M4 M4
A Y M5 M5
A Y M6 M0
B Z M7 M0
B Z M8 M0
First, I group by the pair claim event and for each of these groups I want to count the number of matches between the columns material1 and material 2
For the group by, I have grouped = df.groupby(['claim', 'event'])
but then I don't know how to compare the two new columns.
It should return the following dataframe :
claim event matches
A X 3
A Y 1
B Z 0
Do you have any idea how to do that ?
Use isin
for compare columns and groupby by columns with aggregate sum
, last cast to int
and reset_index
for columns from MultiIndex
:
a = (df['material1'].isin(df['material2']))
df = a.groupby([df['claim'], df['event']]).sum().astype(int).reset_index(name='matches')
Solution with assign to new column:
df['matches'] = df['material1'].isin(df['material2']).astype(int)
df = df.groupby(['claim', 'event'])['matches'].sum().reset_index()
Solutions by @Wen, thank you:
df['matches'] = df['material1'].isin(df['material2']).astype(int)
df = df.groupby(['claim', 'event'], as_index=False)['matches'].sum()
I think it should be slowier in larger DataFrame
s:
df = (df.groupby(['claim', 'event'])
.apply(lambda x : x['material1'].isin(x['material2']).astype(int).sum())
.reset_index(name='matches'))
print (df)
claim event matches
0 A X 3
1 A Y 1
2 B Z 0
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.