由另一列 pandas df 分组的值出现的总和

Question

我需要按列industry统计列name和分组中每个值的出现次数。 目标是获得每个行业的每个名称的总和。 我的数据如下所示：

industry            name
Home             Mike
Home             Mike,Angela,Elliot
Fashion          Angela,Elliot
Fashion          Angela,Elliot

所需的 output 是：

Home Mike:2 Angela:1 Elliot:1
Fashion Angela:2 Elliot:2

Answer 1

将其从评论中移出，经过调试并证明有效：

# count() in the next line won't work without an extra column
df['name_list'] = df['name'].str.split(',')
df.explode('name_list').groupby(['industry', 'name_list']).count()

结果：

                    name
industry name_list      
Fashion  Angela        2
         Elliot        2
Home     Angela        1
         Elliot        1
         Mike          2

Answer 2

您可以使用collections.Counter返回一系列字典，如下所示：

from collections import Counter
s = df.name.str.split(',').groupby(df.industry).sum().agg(Counter)

Out[506]:
industry
Fashion               {'Angela': 2, 'Elliot': 2}
Home       {'Mike': 2, 'Angela': 1, 'Elliot': 1}
Name: name, dtype: object

注意：每个单元格是一个Counter object。 Counter是字典的子类，因此您可以在其上应用字典操作作为字典。

由另一列 pandas df 分组的值出现的总和

问题描述

2 个解决方案

解决方案1
1 已采纳 2020-08-17 18:46:42

解决方案2
0 2020-08-17 18:48:46

由另一列 pandas df 分组的值出现的总和

问题描述

2 个解决方案

解决方案1 1 已采纳 2020-08-17 18:46:42

解决方案2 0 2020-08-17 18:48:46

解决方案1
1 已采纳 2020-08-17 18:46:42

解决方案2
0 2020-08-17 18:48:46