简体   繁体   English

我想计算数据集中的精度、召回率和准确率

[英]I want to calculate Precision, Recall and Accuracy in a dataset

I have two sets of Data: Predicted and Actual.我有两组数据:预测数据和实际数据。

An algorithm predicts at max five unique pre-defined parameters that are in Predicted.算法最多可以预测预测中的五个唯一预定义参数。 Assuming the parameters are alphabets from a to z.假设参数是从 a 到 z 的字母。 So for one row, I see how many out of these paramaters were predicted correctly.因此,对于一行,我看到这些参数中有多少是正确预测的。

Predicted:

Index P1 P2 P3 P4 P5

1     a  b  c  q

2     g

3     s  f  g  v  t


Actual:

Index P1 P2 P3 P4 P5

1     a  s  q  r  t  

2     g

3     t  v

code to generate these dataframes:生成这些数据帧的代码:

import pandas as pd

predicted = pd.DataFrame.from_records(columns =['P'+str(i) for i in range(1,6)],
                                      data=[['a','b','c','q'],['g'], ['s','f','g','v','t']])


actual = pd.DataFrame.from_records(columns =['P'+str(i) for i in range(1,6)],
                                      data=[['a','s','q','r','t'],['g'], ['t','v']])
For Row1: Correctly predicted parameters: a,q
For Row2: Correctly predicted parameters: g
For Row3: Correctly predicted parameters: t,v

How do I calculate Precision, Recall and Accuracy for the same?我如何计算相同的精度、召回率和准确率?

To calculate the accuracy here, you just need to count the number of occurrences where the predicted parameter matched the actual parameter, ignoring cases where they were both None .要计算此处的准确性,您只需计算预测参数与实际参数匹配的出现次数,忽略它们都是None的情况。 Plenty of ways to do that, I'd take this simple option:有很多方法可以做到这一点,我会采取这个简单的选择:

accuracy = ((predicted==actual) & (predicted.notna() | actual.notna())).sum().sum() / (predicted.notna() | actual.notna()).sum().sum()

You can verify that it fits the desired result (2/11).您可以验证它是否符合所需的结果 (2/11)。

precision/recall is a bit trickier for multi-class data - you can do it per label but you certainly don't have enough data here.对于多类数据,精度/召回率有点棘手——你可以按照 label 来做,但这里肯定没有足够的数据。 I'd stick with accuracy for this case...对于这种情况,我会坚持准确性...

PS I've assumed that your accuracy calculation is pretty straight-forward. PS我假设您的准确性计算非常简单。 If it isn't - it should be specified explicitly in your question...如果不是 - 应该在您的问题中明确指定...

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM