如何按列分组，并计算单独列中的值（熊猫）

Question

Here's an example data:这是一个示例数据：

data = [['a1', 1, 'a'], ['b1', 2, 'b'], ['a1', 3, 'a'], ['c1', 4, 'c'], ['b1', 5, 'a'], ['a1', 6, 'b'], ['c1', 7, 'a'], ['a1', 8, 'a']] 

df = pd.DataFrame(data, columns = ['user', 'house', 'type']) 

user house type
a1     1    a
b1     2    b
a1     3    a
c1     4    c
b1     5    a
a1     6    b
c1     7    a
a1     8    a

The final output that I want is this (the types need to be their own columns):我想要的最终 output 是这样的（类型需要是自己的列）：

user houses a b c    
a1      4   3 1 0
b1      2   1 1 0
c1      2   1 0 1

Currently, I'm able to get it by using the following code:目前，我可以使用以下代码获取它：

house = df.groupby(['user']).agg(houses=('house', 'count'))
a = df[df['type']=='a'].groupby(['user']).agg(a=('type', 'count'))
b = df[df['type']=='b'].groupby(['user']).agg(b=('type', 'count'))
c = df[df['type']=='c'].groupby(['user']).agg(c=('type', 'count'))

final = house.merge(a,on='user', how='left').merge(b,on='user', how='left').merge(c,on='user', how='left')

Is there a simpler, cleaner way to do this?有没有更简单、更清洁的方法来做到这一点？

Answer 1

Here is one way using get_dummies() with groupby() and sum .这是将get_dummies()与groupby()和sum一起使用的一种方法。

df['house']=1
df.drop('type',axis=1).assign(**pd.get_dummies(df['type'])).groupby('user').sum()

      house  a  b  c
user                
a1        4  3  1  0
b1        2  1  1  0
c1        2  1  0  1

Answer 2

I will do crosstab with margins=True我会用margins=True做crosstab

pd.crosstab(df.user,df.type,margins=True,margins_name='House').drop('House')
Out[51]: 
type  a  b  c  House
user                
a1    3  1  0      4
b1    1  1  0      2
c1    1  0  1      2

Answer 3

Using GroupBy.size with pd.crosstab and join :使用GroupBy.size和pd.crosstab并join ：

grps = pd.crosstab(df['user'], df['type']).join(df.groupby('user')['house'].size())

      a  b  c  house
user                
a1    3  1  0      4
b1    1  1  0      2
c1    1  0  1      2

If you want user back as column, use reset_index :如果您希望user作为列返回，请使用reset_index ：

print(grps.reset_index())

  user  a  b  c  house
0   a1  3  1  0      4
1   b1  1  1  0      2
2   c1  1  0  1      2

如何按列分组，并计算单独列中的值（熊猫）

问题描述

3 个解决方案

解决方案1
5 2019-11-03 15:03:32

解决方案2
5 2019-11-03 15:05:55

解决方案3
3 已采纳 2019-11-03 15:08:01

如何按列分组，并计算单独列中的值（熊猫）

问题描述

3 个解决方案

解决方案1 5 2019-11-03 15:03:32

解决方案2 5 2019-11-03 15:05:55

解决方案3 3 已采纳 2019-11-03 15:08:01

解决方案1
5 2019-11-03 15:03:32

解决方案2
5 2019-11-03 15:05:55

解决方案3
3 已采纳 2019-11-03 15:08:01