[英]Pandas equivalent to calculate False Positive Rate
I have a pandas dataframe df
from a ML classifier that has the following fields userid
, classifier_score
, truth
.我有一个来自 ML 分类器的 pandas dataframe
df
,该分类器具有以下字段userid
, classifier_score
, truth
。 I want to calculate the false positive rate per userid at a threshold of 0.62.我想以 0.62 的阈值计算每个用户 ID 的误报率。
classifier_score
ranges in the data go from 0.1999 to a 0.89.数据 go 中的
classifier_score
范围从 0.1999 到 0.89。 Right now, I use a series of conditions and create a new column col
that states whether if the relationship between truth and classifier score is a false positive, false negative, true positive or true negative现在,我使用一系列条件并创建一个新列
col
来说明真值与分类器分数之间的关系是假阳性、假阴性、真阳性还是真阴性
df['col'] = df.apply(condition, axis=1)
Then I store the the unique userids in a list然后我将唯一的用户标识存储在一个列表中
unique_users = df.user.unique().tolist()
Then I loop through each one to calculate False positive score. unique_users = df.user.unique().tolist()
然后我遍历每一个来计算假阳性分数。
fpr_dict = {}
for id in user_ids:
fn, tn, fp, tp = 0, 0, 0, 0
elems = df[df.userid==id].tolist()
for elem in col:
if elem == 'fn': fn += 1
elif elem == 'fp': fp += 1
elif elem == 'tp': tp += 1
elif elem == 'tn': tn += 1
try:
fpr = fp / (fp + tn)
except ZeroDivisionError:
fpr = 0.0
fpr_dict[id] = fpr
Is there a better way of doing this with just pandas functions?仅使用 pandas 函数有没有更好的方法? Note: I initialize fn, tn, fp, tp to 0 because some user ids might not have all 4 of them, they will have some combination of the 4
注意:我将 fn、tn、fp、tp 初始化为 0,因为某些用户 id 可能没有全部 4 个,它们将具有 4 个的某种组合
Edit: Dataframe编辑:Dataframe
userid | classifier_score | truth | col
0001 0.6721 1 TP
0001 0.2918 1 FP
0001 0.1236 0 TN
.
.
.
0064 0.7168 0 FN
I didn't test it with an actual dataframe, maybe try this我没有用实际的 dataframe 测试它,也许试试这个
th = 0.62
predicted_pos = df['classifier_score'] > th
userid_group = df.groupby('userid', sort=False)
userid_count = userid_group.size()
df['fp'] = predicted_pos & (df['truth'] == 0)
fpr = userid_group['fp'].sum() / userid_count
if you want a dictionary, you can put dict(fpr)
in the end如果你想要一本字典,你可以把
dict(fpr)
放在最后
Edit: As OP pointed out, fpr = fp/(fp + tn), the calculation should be:编辑:正如 OP 指出的那样,fpr = fp/(fp + tn),计算应该是:
th = 0.62
predicted_pos = df['classifier_score'] > th
userid_group = df.groupby('userid', sort=False)
df['fp'] = predicted_pos & (df['truth'] == 0)
df['tn'] = df['truth'] == 0
fp = userid_group['fp'].sum()
tn = userid_group['tn'].sum()
fpr = fp / (fp + tn)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.