[英]Retrieve the count of the number of different values from a pandas DF based on condition
Dummy df:虚拟df:
columns = ['id', 'answer', 'is_correct']
data = [['1','hello','1.0'],
['1','hi', '1.0'],
['1','bye', '0.0'],
['2', 'dog', '0.0'],
['2', 'cat', '1.0'],
['2', 'dog', '0.0'],
['3', 'Milan', '1.0'],
['3', 'Paris', '0.0'],
['3', 'Paris', '0.0'],
['3', 'Milannnn', '1.0']]
df = pd.DataFrame(columns=columns, data=data)
I want to create a new df with the following columns:我想用以下列创建一个新的df:
headers= ['id', 'number_of_different_correct_answers', 'number_of_different_incorrect_answers']
id
should equal id
from the dummy df. id
应该等于来自虚拟 df 的id
。
Consequently, I want to retrieve the number of different correct answers ( is_correct == 1.0
) for each id
and likewise for is_correct == 0.0
(incorrect answers).因此,我想检索每个id
的不同正确答案( is_correct == 1.0
)的数量,同样,对于is_correct == 0.0
(不正确的答案)。 With different I mean that within id 2
we have dog twice within is_correct == 0.0
thus it should only count as 1.不同的是,在id 2
中,我们在is_correct == 0.0
中有两次 dog,因此它应该只算作 1。
Based on the dummy df, the new df would look like this基于虚拟 df,新的 df 看起来像这样
id number_of_different_correct_answers number_of_different_incorrect_answers
1 2 1
2 1 1
3 2 1
you can drop duplicates, groupby by id and count distinct values:您可以删除重复项,按 id 分组并计算不同的值:
(df.drop_duplicates(['id','answer'])
.groupby('id')['is_correct']
.value_counts()
.unstack(level=1)
.rename(columns = {'0.0':'number_of_different_incorrect_answers',
'1.0':'number_of_different_correct_answers'})
)
produces生产
is_correct number_of_different_incorrect_answers number_of_different_correct_answers
id
1 1 2
2 1 1
3 1 2
this was answered before Python Pandas: pivot table with aggfunc = count unique distinct But it is using old versions of pandas so needs some update这在Python Pandas: pivot table with aggfunc = count unique distinct但它使用的是旧版本的 Z3A42524F883225DFA24
df.pivot_table(values='answer',index='id', columns='is_correct',aggfunc=lambda x: len(x.unique())).rename(columns={'1.0':'number_of_different_correct_answers','0.0':'number_of_different_incorrect_answers'})
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.