Trying to write a function to solve area under a curve given two seperate Pandas DataFrames. The columns for DataFrames are unpacking correctly, as confirmed by the print statement, however, I have no means to create a new Dataframe from the seperate frames or reference a particular index of the fpr dataframe to do a calculation.
def areaUnderCurve(tpr, fpr):
auc = 0.0
for fpr, tpr in zip(tpr['True Positive Rate'], fpr['False Positive Rate']):
auc += np.trapz(y=fpr['False Positive Rate'], x=tpr['True Positive Rate'])
return auc
calcAUC = areaUnderCurve(dataframe, dataframe)
print(calcAUC)
Sample output from print statement:
0 1.0 0.94
1 1.0 0.8866666666666667
2 1.0 0.8133333333333334
3 1.0 0.7866666666666666
4 1.0 0.78
5 1.0 0.6533333333333333
6 1.0 0.6333333333333333
7 1.0 0.6266666666666667
8 1.0 0.6133333333333333
9 1.0 0.6
***update code for trying to calculate AUC based on answer, receiving the following error "float object is not subscriptable"
numpy
has methods for numerical integration, eg, np.trapz
which calculates using the trapezoid rule.
import numpy as np
np.trapz(y=fpr['False Positive Rate'], x=tpr['True Positive Rate'])
should give you the area.
@Jay Py
To answer your first question, you can definitely create a dataframe from two dataframes
data=pd.DataFrame(zip(tpr['True Positive Rate'],fpr['False Positive Rate']),columns=['TPR','FPR'])
In order to calculate the ROC, you can use the following logic on this dataframe
data['dFPR']=list(np.diff(data['FPR'].values)) + [0]
data['dTPR']=list(np.diff(data['TPR'].values)) + [0]
data['sum1']=data.apply(lambda x : x['TPR'] * x['dFPR'],axis=1)
data['sum2']=data.apply(lambda x : x['dTPR'] * x['dFPR'],axis=1)
ROC=sum(data['sum1']) + sum(data['sum2'])/2
Example with random values
tpr=pd.DataFrame(np.random.rand(100,2),columns=['Col1','True Positive Rate'])
fpr=pd.DataFrame(np.random.rand(100,2),columns=['Col2','False Positive Rate'])
data=pd.DataFrame(zip(tpr['True Positive Rate'],fpr['False Positive Rate']),columns=['TPR','FPR'])
data['dFPR']=list(np.diff(data['FPR'].values)) + [0]
data['dTPR']=list(np.diff(data['TPR'].values)) + [0]
data['sum1']=data.apply(lambda x : x['TPR'] * x['dFPR'],axis=1)
data['sum2']=data.apply(lambda x : x['dTPR'] * x['dFPR'],axis=1)
ROC=sum(data['sum1']) + sum(data['sum2'])/2
print(ROC)
0.773539521758
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.