I have an assignment problem where I have to use y
and y_score
columns in a .csv file where y
is the actual score and y_score
is the output one. I have to consider unique y_score
values and sort them in ascending order and use each of this as threshold value and compare with actual y_score
to find y_predicted
. So, y_predicted = [0 if y_score < threshold else 1]
. from this find the False positive and false negative to calculate A = 500*number of false negative + 100* number of false positive
. Finally find the lowest value of metric A.
This is what I've written:
...
import pandas as pd
data2 = pd.read_csv('5_c.csv')
uniq2= data2['prob'].unique()
uniq2.sort()
aa=[]
for k,v in enumerate(uniq2):
data2['thr'] = v
for j,l in enumerate(data2['prob']):
if l>v:
data2['pred'] = 0
else:
data2['pred']=1
df = data2[(data2['y']==0)&(data2['pred']==1)]
FP = df.shape
df = data2[(data2['y']==1)&(data2['pred']==1)]
FN = df.shape
A=100*FN+500*FP
aa.append(A)
m=np.argmin(aa)
print(m)
...
This is a sample of my csv file
y prob
0 0.458521
0 0.505037
0 0.418652
0 0.412057
0 0.375579
0 0.595387
0 0.370288
import pandas as pd
import numpy as np
df = pd.DataFrame({'y':[0,0,0,0,1,1,0],'prob':[0.458521,0.505037,0.418652,0.412057,0.375579,0.595387,0.370288]})
un = df['prob'].unique()
aa = []
for i in un:
fp = sum((df['y']==0)&((df['prob']>i)==1))
fn = sum((df['y']==1)&((df['prob']>i)==0))
A = 500*fn+100*fp
aa.append(A)
print(un[np.argmin(aa)])
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.