Creating A Pandas DataFrame From Two Separate DataFrames

Question

Trying to write a function to solve area under a curve given two seperate Pandas DataFrames. The columns for DataFrames are unpacking correctly, as confirmed by the print statement, however, I have no means to create a new Dataframe from the seperate frames or reference a particular index of the fpr dataframe to do a calculation.

def areaUnderCurve(tpr, fpr):
auc = 0.0
for fpr, tpr in zip(tpr['True Positive Rate'], fpr['False Positive Rate']):
    auc += np.trapz(y=fpr['False Positive Rate'], x=tpr['True Positive Rate'])                      
return auc

calcAUC = areaUnderCurve(dataframe, dataframe)
print(calcAUC)

Sample output from print statement:

0 1.0 0.94
1 1.0 0.8866666666666667
2 1.0 0.8133333333333334
3 1.0 0.7866666666666666
4 1.0 0.78
5 1.0 0.6533333333333333
6 1.0 0.6333333333333333
7 1.0 0.6266666666666667
8 1.0 0.6133333333333333
9 1.0 0.6

***update code for trying to calculate AUC based on answer, receiving the following error "float object is not subscriptable"

Answer 1

numpy has methods for numerical integration, eg, np.trapz which calculates using the trapezoid rule.

import numpy as np

np.trapz(y=fpr['False Positive Rate'], x=tpr['True Positive Rate'])

should give you the area.

Answer 2

@Jay Py

To answer your first question, you can definitely create a dataframe from two dataframes

data=pd.DataFrame(zip(tpr['True Positive Rate'],fpr['False Positive Rate']),columns=['TPR','FPR'])

In order to calculate the ROC, you can use the following logic on this dataframe

data['dFPR']=list(np.diff(data['FPR'].values)) + [0]
data['dTPR']=list(np.diff(data['TPR'].values)) + [0]
data['sum1']=data.apply(lambda x : x['TPR'] * x['dFPR'],axis=1)
data['sum2']=data.apply(lambda x : x['dTPR'] * x['dFPR'],axis=1)
ROC=sum(data['sum1']) + sum(data['sum2'])/2

Example with random values

tpr=pd.DataFrame(np.random.rand(100,2),columns=['Col1','True Positive Rate'])
fpr=pd.DataFrame(np.random.rand(100,2),columns=['Col2','False Positive Rate'])
data=pd.DataFrame(zip(tpr['True Positive Rate'],fpr['False Positive Rate']),columns=['TPR','FPR'])
data['dFPR']=list(np.diff(data['FPR'].values)) + [0]
data['dTPR']=list(np.diff(data['TPR'].values)) + [0]
data['sum1']=data.apply(lambda x : x['TPR'] * x['dFPR'],axis=1)
data['sum2']=data.apply(lambda x : x['dTPR'] * x['dFPR'],axis=1)
ROC=sum(data['sum1']) + sum(data['sum2'])/2
print(ROC)

0.773539521758

Creating A Pandas DataFrame From Two Separate DataFrames

Question

2 answers

solution1
1 2018-04-22 05:58:43

solution2
1 2018-04-22 06:02:19

Creating A Pandas DataFrame From Two Separate DataFrames

Question

2 answers

solution1 1 2018-04-22 05:58:43

solution2 1 2018-04-22 06:02:19

solution1
1 2018-04-22 05:58:43

solution2
1 2018-04-22 06:02:19