I didn't find a suitable bray distance (Bray-Curtis index) function for two datasets in R packages. So I create one, but it is very time-consuming. Is there a faster way to do this?
bray_dist <- function(a, b) {
a_len = dim(a)[1]
b_len = dim(b)[1]
distmatrix = matrix(data=NA, nrow=a_len, ncol=b_len)
for(i in seq(1,a_len)) {
for(j in seq(1,b_len)) {
distmatrix[i,j] = 2*sum(pmin(a[i,], b[j,]))/(sum(a[i,]) + sum(b[j,]))
}
}
return(distmatrix)
}
Here is an example of the data, my real data is larger than this, and because I repeat it many times so it will take a long time
a <- matrix( round(rnorm(400, 4)), ncol=5)
b <- matrix( round(rnorm(500, 5)), ncol=5)
If I use bray_dist function, it will take 9.700986 secs
tstart<-Sys.time()
for (i in 1:100){
c<-bray_dist(a,b)
}
Sys.time()-tstart
if use b_dist as showed by Mohanasundaram below, it will take 8.709085 secs, better than the previous one, but still no so good.
tstart<-Sys.time()
for (i in 1:100){
c<-b_dist(a,b)
}
Sys.time()-tstart
With apply function, I could see the difference of 0.03 seconds for 100 by 100 matrix
b_dist <- function(a, b){
d <- t(apply(a, 1, function(x) apply(b, 1, function (y) 2*sum(pmin(x, y)/(sum(x) + sum(y))))))
return(d)
}
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.