简体   繁体   中英

Pandas groupby multiple columns with values of unique groupings as their own column

Example Dataframe =

df = pd.DataFrame({'ID': [1,1,2,2,2,3,3,3],
...                'Type': ['b','b','b','a','a','a','a']})

I would like to return the counts grouped by ID and then a column for each unique ID in Type and the count of each Type for that grouped row:

pd.DataFrame({'ID': [1,2,3],'Count_TypeA': [0,2,3], 'CountTypeB':[2,1,0]}, 'TotalCount':[2,3,3])

Is there an easy way to do this using the groupby function in pandas?

For what you need you can use the method get_dummies from pandas . This will convert categorical variable into dummy/indicator variables. You can check the reference here .

Check if this meets your requirements:

import pandas as pd

df = pd.DataFrame({'ID': [1, 1, 2, 2, 2, 3, 3, 3],
                   'Type': ['b', 'b', 'b', 'a', 'a', 'a', 'a', 'a']})

dummy_var = pd.get_dummies(df["Type"])
dummy_var.rename(columns={'a': 'CountTypeA', 'b': 'CountTypeB'}, inplace=True)

df1 = pd.concat([df['ID'], dummy_var], axis=1)

df_group1 = df1.groupby(by=['ID'], as_index=False).sum()

df_group1['TotalCount'] = df_group1['CountTypeA'] + df_group1['CountTypeB']
print(df_group1)

This will print the following result:

   ID  CountTypeA  CountTypeB  TotalCount
0   1           0           2           2
1   2           2           1           3
2   3           3           0           3

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM