简体   繁体   中英

Creating A Pandas DataFrame From Two Separate DataFrames

Trying to write a function to solve area under a curve given two seperate Pandas DataFrames. The columns for DataFrames are unpacking correctly, as confirmed by the print statement, however, I have no means to create a new Dataframe from the seperate frames or reference a particular index of the fpr dataframe to do a calculation.

def areaUnderCurve(tpr, fpr):
auc = 0.0
for fpr, tpr in zip(tpr['True Positive Rate'], fpr['False Positive Rate']):
    auc += np.trapz(y=fpr['False Positive Rate'], x=tpr['True Positive Rate'])                      
return auc

calcAUC = areaUnderCurve(dataframe, dataframe)
print(calcAUC)

Sample output from print statement:

0 1.0 0.94
1 1.0 0.8866666666666667
2 1.0 0.8133333333333334
3 1.0 0.7866666666666666
4 1.0 0.78
5 1.0 0.6533333333333333
6 1.0 0.6333333333333333
7 1.0 0.6266666666666667
8 1.0 0.6133333333333333
9 1.0 0.6

***update code for trying to calculate AUC based on answer, receiving the following error "float object is not subscriptable"

numpy has methods for numerical integration, eg, np.trapz which calculates using the trapezoid rule.

import numpy as np

np.trapz(y=fpr['False Positive Rate'], x=tpr['True Positive Rate'])

should give you the area.

@Jay Py

To answer your first question, you can definitely create a dataframe from two dataframes

data=pd.DataFrame(zip(tpr['True Positive Rate'],fpr['False Positive Rate']),columns=['TPR','FPR'])

In order to calculate the ROC, you can use the following logic on this dataframe

data['dFPR']=list(np.diff(data['FPR'].values)) + [0]
data['dTPR']=list(np.diff(data['TPR'].values)) + [0]
data['sum1']=data.apply(lambda x : x['TPR'] * x['dFPR'],axis=1)
data['sum2']=data.apply(lambda x : x['dTPR'] * x['dFPR'],axis=1)
ROC=sum(data['sum1']) + sum(data['sum2'])/2

Example with random values

tpr=pd.DataFrame(np.random.rand(100,2),columns=['Col1','True Positive Rate'])
fpr=pd.DataFrame(np.random.rand(100,2),columns=['Col2','False Positive Rate'])
data=pd.DataFrame(zip(tpr['True Positive Rate'],fpr['False Positive Rate']),columns=['TPR','FPR'])
data['dFPR']=list(np.diff(data['FPR'].values)) + [0]
data['dTPR']=list(np.diff(data['TPR'].values)) + [0]
data['sum1']=data.apply(lambda x : x['TPR'] * x['dFPR'],axis=1)
data['sum2']=data.apply(lambda x : x['dTPR'] * x['dFPR'],axis=1)
ROC=sum(data['sum1']) + sum(data['sum2'])/2
print(ROC)

0.773539521758

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM