简体   繁体   English

Groupby 值如何计算熊猫数据框?

[英]How Groupby value counts pandas dataframe?

this is my dataframe这是我的数据框

df = pd.DataFrame([
    ('a', 0, 0),
    ('b', 1, 1),
    ('c', 1, 0),
    ('d', 2, 1),
    ('e', 2, 1)
], columns=['name', 'cluster', 'is_selected'])

i want to count each letter selected in each cluster and group by cluster.我想计算每个集群中选择的每个字母并按集群分组。 i tried this : df.groupby('cluster')['is_selected'].value_counts() and i get this output :我试过这个: df.groupby('cluster')['is_selected'].value_counts()我得到这个输出:

cluster  is_selected
0        0              1
1        0              1
         1              1
2        1              2
Name: is_selected, dtype: int64

but what i want is this format:但我想要的是这种格式:

cluster  count_selected
0        1        
1        1             
2        2       

please how can i fix it?请问我该如何解决?

Based on your explanation you want to count the letters that are selected (value of 1 in is_selected ) grouped by clusters.根据您的解释,您要计算按集群分组的所选字母( is_selected值为 1 )。

if that's what you're looking for then this should help:如果这就是您要寻找的内容,那么这应该会有所帮助:

df[df.is_selected == 1].groupby(['cluster'])['name'].count().reset_index(name='count_selected')

The output is a little different but then again I'm not entirely sure what would cause your cluster 0 to have a count of 1 in your expected output, so i hope this is it!输出有点不同,但我又不完全确定是什么导致集群 0 在预期输出中的计数为 1,所以我希望就是这样!

output:输出:

    cluster count_selected
0   1       1
1   2       2

This should give the expected output:这应该给出预期的输出:

df.where(df['is_selected'] == 1).groupby('cluster')['is_selected'].count().rename(
    'count_selected').reindex(df['cluster'].drop_duplicates()).fillna(0).astype(int).reset_index()

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM