简体   繁体   English

根据条件在groupby数据框上创建汇总表

[英]Creating summary table on groupby dataframe based on condition

I have a pandas dataframe df that looks like 我有一个看起来像的熊猫数据框df

userid  trip_id segmentid   actual  prediction
  1       13       40          3       3
  1       6        2           1       1
  1       44       3           2       3
  2       70       19          1       1
  2       12       5           0       0

I need to create a summary dataframe dfsummary grouped on column userid , having three columns userid, correct_classified, incorrect_classified. 我需要创建一个汇总数据框dfsummary ,该数据框按列userid分组 ,具有三列userid,correct_classified,corrected_classified。 If actual and prediction values are same then it is correct classified, otherwise incorrect classified. 如果实际值和预测值相同,则将其正确分类,否则将分类错误。

I can count the correct_classfied on whole dataframe as 我可以将对整个数据框的correct_classfied视为

correct_classified = submission[(submission['Actual'] == submission['prediction'])]
incorrect_classified = submission[(submission['Actual'] != submission['prediction'])]

but don't getting an idea to create summary table grouped on user id, that should look like this 但是不知道创建按用户ID分组的摘要表的想法,它应该像这样

userid  correct_classified  incorrect_classified
  1             2                    1
  2             2                    0

You can use pd.crosstab after creating a conditional array: 您可以在创建条件数组后使用pd.crosstab

flags = np.where(df['actual'].eq(df['prediction']), 'correct', 'incorrect')

res = pd.crosstab(df['userid'], flags)

print(res)

col_0   correct  incorrect
userid                    
1             2          1
2             2          0

You can also use pivot table ie 您也可以使用pivot table

m = df['actual']==df['prediction']

# assign the conditions to new columns and aggregate.  
df.assign(correct_classified=m,incorrect_classified=~m).pivot_table(index='userid',
                                                                    aggfunc='sum',
                                                                    values=['correct_classified',
                                                                            'incorrect_classified'])

Output : 输出:

     correct_classified  incorrect_classified
userid                                          
1                      2.0                   1.0
2                      2.0                   0.0

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM