[英]Creating summary table on groupby dataframe based on condition
I have a pandas dataframe df that looks like 我有一个看起来像的熊猫数据框df
userid trip_id segmentid actual prediction
1 13 40 3 3
1 6 2 1 1
1 44 3 2 3
2 70 19 1 1
2 12 5 0 0
I need to create a summary dataframe dfsummary grouped on column userid , having three columns userid, correct_classified, incorrect_classified. 我需要创建一个汇总数据框dfsummary ,该数据框按列userid分组 ,具有三列userid,correct_classified,corrected_classified。 If actual and prediction values are same then it is correct classified, otherwise incorrect classified. 如果实际值和预测值相同,则将其正确分类,否则将分类错误。
I can count the correct_classfied on whole dataframe as 我可以将对整个数据框的correct_classfied视为
correct_classified = submission[(submission['Actual'] == submission['prediction'])]
incorrect_classified = submission[(submission['Actual'] != submission['prediction'])]
but don't getting an idea to create summary table grouped on user id, that should look like this 但是不知道创建按用户ID分组的摘要表的想法,它应该像这样
userid correct_classified incorrect_classified
1 2 1
2 2 0
You can use pd.crosstab
after creating a conditional array: 您可以在创建条件数组后使用pd.crosstab
:
flags = np.where(df['actual'].eq(df['prediction']), 'correct', 'incorrect')
res = pd.crosstab(df['userid'], flags)
print(res)
col_0 correct incorrect
userid
1 2 1
2 2 0
You can also use pivot table
ie 您也可以使用pivot table
m = df['actual']==df['prediction']
# assign the conditions to new columns and aggregate.
df.assign(correct_classified=m,incorrect_classified=~m).pivot_table(index='userid',
aggfunc='sum',
values=['correct_classified',
'incorrect_classified'])
Output : 输出:
correct_classified incorrect_classified
userid
1 2.0 1.0
2 2.0 0.0
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.