简体   繁体   中英

Is there is a fast way to calculate Bray distance between two datasets?

I didn't find a suitable bray distance (Bray-Curtis index) function for two datasets in R packages. So I create one, but it is very time-consuming. Is there a faster way to do this?

    bray_dist <- function(a, b) {
      a_len = dim(a)[1]
      b_len = dim(b)[1]
      distmatrix = matrix(data=NA, nrow=a_len, ncol=b_len)
      for(i in seq(1,a_len)) {
        for(j in seq(1,b_len)) {
          distmatrix[i,j] = 2*sum(pmin(a[i,], b[j,]))/(sum(a[i,]) + sum(b[j,]))
        }
      }
      return(distmatrix)
    }

Here is an example of the data, my real data is larger than this, and because I repeat it many times so it will take a long time

a <- matrix( round(rnorm(400, 4)), ncol=5)
b <- matrix( round(rnorm(500, 5)), ncol=5)

If I use bray_dist function, it will take 9.700986 secs

tstart<-Sys.time()
for (i in 1:100){
 c<-bray_dist(a,b)
}
Sys.time()-tstart

if use b_dist as showed by Mohanasundaram below, it will take 8.709085 secs, better than the previous one, but still no so good.

tstart<-Sys.time()
for (i in 1:100){
 c<-b_dist(a,b)
}
Sys.time()-tstart

With apply function, I could see the difference of 0.03 seconds for 100 by 100 matrix

b_dist <- function(a, b){
  d <- t(apply(a, 1, function(x) apply(b, 1, function (y) 2*sum(pmin(x, y)/(sum(x) + sum(y))))))
  return(d)
}

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM