Groupby 值如何计算熊猫数据框？

Question

这是我的数据框

df = pd.DataFrame([
    ('a', 0, 0),
    ('b', 1, 1),
    ('c', 1, 0),
    ('d', 2, 1),
    ('e', 2, 1)
], columns=['name', 'cluster', 'is_selected'])

我想计算每个集群中选择的每个字母并按集群分组。 我试过这个： df.groupby('cluster')['is_selected'].value_counts()我得到这个输出：

cluster  is_selected
0        0              1
1        0              1
         1              1
2        1              2
Name: is_selected, dtype: int64

但我想要的是这种格式：

cluster  count_selected
0        1        
1        1             
2        2

请问我该如何解决？

Answer 1

根据您的解释，您要计算按集群分组的所选字母（ is_selected值为 1 ）。

如果这就是您要寻找的内容，那么这应该会有所帮助：

df[df.is_selected == 1].groupby(['cluster'])['name'].count().reset_index(name='count_selected')

输出有点不同，但我又不完全确定是什么导致集群 0 在预期输出中的计数为 1，所以我希望就是这样！

输出：

    cluster count_selected
0   1       1
1   2       2

Answer 2

这应该给出预期的输出：

df.where(df['is_selected'] == 1).groupby('cluster')['is_selected'].count().rename(
    'count_selected').reindex(df['cluster'].drop_duplicates()).fillna(0).astype(int).reset_index()

Groupby 值如何计算熊猫数据框？

问题描述

2 个解决方案

解决方案1
1 已采纳 2020-03-12 13:56:19

解决方案2
1 2020-03-12 14:01:19

Groupby 值如何计算熊猫数据框？

问题描述

2 个解决方案

解决方案1 1 已采纳 2020-03-12 13:56:19

解决方案2 1 2020-03-12 14:01:19

解决方案1
1 已采纳 2020-03-12 13:56:19

解决方案2
1 2020-03-12 14:01:19