简体   繁体   中英

Finding Area Under the Curve (AUC) in R by Trapezoidal Rule

I have a below mentioned Sample List containing Data Frames (Each in has ...ID,yobs,x(independent variable)).And I want to find AUC(Trapezoidal rule)for each case(ID).., So that my output(master data frame) looks like following (have shown at last)

Can anybody suggest the efficient way of finding this (I have a high number of rows for each ID's)

Thank you

#Some Make up code for only one data frame
Y1=c(0,2,5,7,9)
Y2=c(0,1,3,8,11)
Y3=c(0,4,8,9,12,14,18) 
t1=c(0:4)
t2=c(0:4)
t3=c(0:6) 

a1=data.frame(ID=1,y=Y1,x=t1) 
a2=data.frame(ID=2,y=Y2,x=t2) 
a3=data.frame(ID=3,y=Y3,x=t3) 
data=rbind(a1,a2,a3) 

#dataA(Just to show)
   ID  obs time 
1   1  0   0 
2   1  2   1 
3   1  5   2 
4   1  7   3 
5   1  9   4 
6   2  0   0 
7   2  1   1 
8   2  3   2 
9   2  8   3 
10  2 11   4 
11  3  0   0 
12  3  4   1 
13  3  8   2 
14  3  9   3 
15  3 12   4 
16  3 14   5 
17  3 18   6 

 #dataB(Just to show)
   ID  obs time 
1   1  0   0 
2   1  2   1 
3   1  5   2 
4   1  7   3 
5   1  9   4 
6   2  0   0 
7   2  1   1 
8   2  3   2 

  #dataC(Just to show)
   ID  obs time 
1   1  0   0 
2   1  2   1 
3   1  5   2 
4   1  7   3 
5   1  9   4 
6   2  0   0 
7   2  1   1 
8   2  3   2 

##Desired output

      ID  AUC
dataA  1   XX
dataA  2   XX
dataA  3   XX
dataB  1   XX
dataB  2   XX
dataC  1   XX
dataC  2   XX

Here are two other ways. The first uses integrate(...) on a function defined by the linear interpolation between the points. The second uses the trapz(...) function described in the comment from @nrussel.

f <- function(x,df) approxfun(df)(x)
sapply(split(data,data$ID),function(df)c(integrate(f,min(df$x),max(df$x),df[3:2])$value))
#    1    2    3 
# 18.5 17.5 56.0 

library(caTools)
sapply(split(data,data$ID),function(df) trapz(df$x,df$y))
#    1    2    3 
# 18.5 17.5 56.0 

I'm guessing something like this would work

calcauc<-function(data) {
    psum<-function(x) rowSums(embed(x,2))
    stack(lapply(split(data, data$ID), function(z) 
        with(z, sum(psum(y) * diff(x)/ 2)))
    )
}
calcauc(data)

#   values ind
# 1   18.5   1
# 2   17.5   2
# 3   56.0   3

Of course normally x and y values are between 0 and 1 for ROC curves which is why we seem to have such large "AUC" values but really this is just the area of the polygon underneath the line defined by the points in the data set.

The psum function is just a helper function to calculate pair-wise sums (useful in the formula for the area of trapezoid).

Basically we use split() to look at one ID at a time, then we calculate the area for each ID, then we use stack() to bring everything back into one data.frame.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM