简体   繁体   中英

Distance between two data frames of unequal size

I have two data frames of unequal size:

>df1

    b  c  d
a   2  3  4

>df2

   g  h  i
e  1  1  5
f  0  4  3

I need to calculate distances between elements of these data frames, by subtracting values contained in df1 from every row in df2 , thus I want to get:

   c  d  e
a  1  2  1
b  2  1  1

Trying >myfunc1 <- function(x1,x2){abs(x1 - x2)} myfunc1(df1, df2) as well as df3 <- abs(df2 - df1) doesn't help because of unequal sizes.

require(purrr)

map2_df(df1, df2, ~abs(.x - .y)) 

Or Gregor's method: abs(df2 - df1[rep(1, nrow(df2)), ])

From my limited test, map2_df appears to be faster

df1 <- fread( "
b  c  d
2  3  4
")

df2 <- fread("    
g  h  i
1  1  5
0  4  3
")

df1 <- rbindlist(replicate(10000, df1, simplify = F))
df2 <- rbindlist(replicate(10000, df2, simplify = F))

require(purrr)
f1 <- function(){
  map2_df(df1, df2, ~abs(.x - .y)) 
}
f2 <- function(){
  abs(df2 - df1[rep(1, nrow(df2)), ]) 
}

library(microbenchmark)

microbenchmark(f1(), f2())

#Unit: microseconds
# expr      min        lq     mean   median       uq      max neval
# f1()  727.385  891.4875 1268.775  956.923 1471.179 4651.075   100
# f2() 1737.025 2011.2815 2666.744 2218.666 2889.846 8572.715   100

在此处输入图片说明

如果第一个矩阵中总是有一行,则可以使用基本的 r apply 方法:

t(apply(df2, 1, function(x) abs(x - df1[1,])))

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM