由另一列 pandas df 分组的值出现的总和

Question

I need to count the occurrence of each value in column name and group by column industry .我需要按列industry统计列name和分组中每个值的出现次数。 The goal is to get the sum of each name per industry.目标是获得每个行业的每个名称的总和。 My data looks like this:我的数据如下所示：

industry            name
Home             Mike
Home             Mike,Angela,Elliot
Fashion          Angela,Elliot
Fashion          Angela,Elliot

The desired output is:所需的 output 是：

Home Mike:2 Angela:1 Elliot:1
Fashion Angela:2 Elliot:2

Answer 1

Moving this out of comments, debugged and proved working:将其从评论中移出，经过调试并证明有效：

# count() in the next line won't work without an extra column
df['name_list'] = df['name'].str.split(',')
df.explode('name_list').groupby(['industry', 'name_list']).count()

Result:结果：

                    name
industry name_list      
Fashion  Angela        2
         Elliot        2
Home     Angela        1
         Elliot        1
         Mike          2

Answer 2

You may use collections.Counter to return a series of dictionaries as follows:您可以使用collections.Counter返回一系列字典，如下所示：

from collections import Counter
s = df.name.str.split(',').groupby(df.industry).sum().agg(Counter)

Out[506]:
industry
Fashion               {'Angela': 2, 'Elliot': 2}
Home       {'Mike': 2, 'Angela': 1, 'Elliot': 1}
Name: name, dtype: object

Note : Each cell is a Counter object.注意：每个单元格是一个Counter object。 Counter is a subclass of dictionary, so you can apply dictionary operations on it as an dictionary. Counter是字典的子类，因此您可以在其上应用字典操作作为字典。

由另一列 pandas df 分组的值出现的总和

问题描述

2 个解决方案

解决方案1
1 已采纳 2020-08-17 18:46:42

解决方案2
0 2020-08-17 18:48:46

由另一列 pandas df 分组的值出现的总和

问题描述

2 个解决方案

解决方案1 1 已采纳 2020-08-17 18:46:42

解决方案2 0 2020-08-17 18:48:46

解决方案1
1 已采纳 2020-08-17 18:46:42

解决方案2
0 2020-08-17 18:48:46