根据条件从 pandas DF 中检索不同值的数量

Question

Dummy df:虚拟df：

columns = ['id', 'answer', 'is_correct']
data = [['1','hello','1.0'],
       ['1','hi', '1.0'],
       ['1','bye', '0.0'],
        ['2', 'dog', '0.0'],
        ['2', 'cat', '1.0'],
        ['2', 'dog', '0.0'],
        ['3', 'Milan', '1.0'],
        ['3', 'Paris', '0.0'],
        ['3', 'Paris', '0.0'],
        ['3', 'Milannnn', '1.0']]
df = pd.DataFrame(columns=columns, data=data)

I want to create a new df with the following columns:我想用以下列创建一个新的df：

headers= ['id', 'number_of_different_correct_answers', 'number_of_different_incorrect_answers']

id should equal id from the dummy df. id应该等于来自虚拟 df 的id 。

Consequently, I want to retrieve the number of different correct answers ( is_correct == 1.0 ) for each id and likewise for is_correct == 0.0 (incorrect answers).因此，我想检索每个id的不同正确答案（ is_correct == 1.0 ）的数量，同样，对于is_correct == 0.0 （不正确的答案）。 With different I mean that within id 2 we have dog twice within is_correct == 0.0 thus it should only count as 1.不同的是，在id 2中，我们在is_correct == 0.0中有两次 dog，因此它应该只算作 1。

Based on the dummy df, the new df would look like this基于虚拟 df，新的 df 看起来像这样

id  number_of_different_correct_answers number_of_different_incorrect_answers
1   2                                   1
2   1                                   1
3   2                                   1

Answer 1

you can drop duplicates, groupby by id and count distinct values:您可以删除重复项，按 id 分组并计算不同的值：

(df.drop_duplicates(['id','answer'])
   .groupby('id')['is_correct']
   .value_counts()
   .unstack(level=1)
   .rename(columns = {'0.0':'number_of_different_incorrect_answers', 
                      '1.0':'number_of_different_correct_answers'})
)

produces生产


is_correct  number_of_different_incorrect_answers   number_of_different_correct_answers
id      
1           1                                       2
2           1                                       1
3           1                                       2

Answer 2

this was answered before Python Pandas: pivot table with aggfunc = count unique distinct But it is using old versions of pandas so needs some update这在Python Pandas: pivot table with aggfunc = count unique distinct但它使用的是旧版本的 Z3A42524F883225DFA24

df.pivot_table(values='answer',index='id', columns='is_correct',aggfunc=lambda x: len(x.unique())).rename(columns={'1.0':'number_of_different_correct_answers','0.0':'number_of_different_incorrect_answers'})

根据条件从 pandas DF 中检索不同值的数量

问题描述

2 个解决方案

解决方案1
2 已采纳 2021-03-10 21:06:23

解决方案2
2 2021-03-10 21:08:09

根据条件从 pandas DF 中检索不同值的数量

问题描述

2 个解决方案

解决方案1 2 已采纳 2021-03-10 21:06:23

解决方案2 2 2021-03-10 21:08:09

解决方案1
2 已采纳 2021-03-10 21:06:23

解决方案2
2 2021-03-10 21:08:09