[英]Creating summary table on groupby dataframe based on condition
我有一個看起來像的熊貓數據框df
userid trip_id segmentid actual prediction
1 13 40 3 3
1 6 2 1 1
1 44 3 2 3
2 70 19 1 1
2 12 5 0 0
我需要創建一個匯總數據框dfsummary ,該數據框按列userid分組 ,具有三列userid,correct_classified,corrected_classified。 如果實際值和預測值相同,則將其正確分類,否則將分類錯誤。
我可以將對整個數據框的correct_classfied視為
correct_classified = submission[(submission['Actual'] == submission['prediction'])]
incorrect_classified = submission[(submission['Actual'] != submission['prediction'])]
但是不知道創建按用戶ID分組的摘要表的想法,它應該像這樣
userid correct_classified incorrect_classified
1 2 1
2 2 0
您可以在創建條件數組后使用pd.crosstab
:
flags = np.where(df['actual'].eq(df['prediction']), 'correct', 'incorrect')
res = pd.crosstab(df['userid'], flags)
print(res)
col_0 correct incorrect
userid
1 2 1
2 2 0
您也可以使用pivot table
m = df['actual']==df['prediction']
# assign the conditions to new columns and aggregate.
df.assign(correct_classified=m,incorrect_classified=~m).pivot_table(index='userid',
aggfunc='sum',
values=['correct_classified',
'incorrect_classified'])
輸出:
correct_classified incorrect_classified
userid
1 2.0 1.0
2 2.0 0.0
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.