简体   繁体   中英

Calculating distance between coordinates in different dataframes

Suppose I have the following two dataframes

dfA <- data.frame(x = rpois(10,2), y = rpois(10,2), z = rpois(10,2), q = rpois(10,2), t = rpois(10,2))
dfB <- data.frame(x = rpois(10,2), y = rpois(10,2), z = rpois(10,2), q = rpois(10,2), t = rpois(10,2))
dfAB <- map2_df(dfA, dfB, str_c, sep=",") %>%
  rename_all(~ str_c('C', seq_along(.)))

dfC <- data.frame(x = rpois(10,2), y = rpois(10,2), z = rpois(10,2), q = rpois(10,2), t = rpois(10,2))
dfD <- data.frame(x = rpois(10,2), y = rpois(10,2), z = rpois(10,2), q = rpois(10,2), t = rpois(10,2))
dfCD <- map2_df(dfC, dfD, str_c, sep=",") %>%
  rename_all(~ str_c('C', seq_along(.)))

What I looking for is to find the distance between coordinates in the first dataframe and the second, so I get a third dataframe with the distance between the first cell of dfAB and the first cell of dfCD, and the distance between the 2nd cell of dfAB and the 2nd cell of dfCD and so on; ie call columns C and rows R, I would like the distance between

dfAB        and     dfCD
C1 C2 C...          C1 C2 C...  
R1 R1               R1 R1   
R2 R2               R2 R2
... ...             ... ...
etc

What I am looking for is the distance between dfABC1R1 and dfCDC1R1, dfABC1R2 and dfCDC1R2, dfABC2R1 and dfCDC2R1, etc.

When I try using

dist(dfAB,dfCD)

I get the error: Error in dist(dfAB,dfCD) : invalid distance method

Any help is much appreciated

Note about the error message

  • Your dist(dfAB, dfCD) troughs error, because second argument of dist() is character string describing the method for distance calculation (eg "euclidean");
  • Coordinate tuples in your dfAB and dfCD data frames are character strings. So even if dist() allowed you to calculate distance between each element of two data frames, it would through error.

My approach isn't much elegant, but probably it is the point you can start to think of how to approach your data.

Data

set.seed(60007561)

dat <- split(rpois(60, 2), paste0('df_', rep(letters[1:4], each = 15)))

for(i in names(dat)) {
  assign(
    i, 
    data.frame(split(dat[[i]], rep(letters[1:5], each = 3)))
    )
}

# inspect the data

head(
  do.call(
    cbind,
    lapply(
      list(df_a, df_b, df_c, df_d), 
      cbind, 
      data.frame(' ' = rep(' ', 3), check.names = F)
      )
  )
)

#   a b c d e   a b c d e   a b c d e   a b c d e  
# 1 1 2 1 2 3   0 2 1 2 1   5 0 2 2 0   2 5 2 3 3  
# 2 5 0 2 0 3   2 5 1 2 3   0 0 4 2 2   3 1 1 1 2  
# 3 3 2 1 3 0   4 2 0 2 2   0 3 1 2 0   2 2 5 1 4 

Sulution

Make two tibbles with columns a...e where each column contains data frames with columns x, y corresponding to the data from data frames df_a , df_b ; and df_c , df_d respectively. First resulting tible corresponds to from points , and second tibble corresponds to to points :

df_ab <- as_tibble(lapply(map2(df_a, df_b, ~ list(x = .x, y = .y)), as.data.frame))
df_cd <- as_tibble(lapply(map2(df_c, df_d, ~ list(x = .x, y = .y)), as.data.frame))
#df_ab
# # A tibble: 3 x 5
#     a$x    $y   b$x    $y   c$x    $y   d$x    $y   e$x    $y
#   <int> <int> <int> <int> <int> <int> <int> <int> <int> <int>
# 1     1     0     2     2     1     1     2     2     3     1
# 2     5     2     0     5     2     1     0     2     3     3
# 3     3     4     2     2     1     0     3     2     0     2
#
#df_cd
# # A tibble: 3 x 5
#     a$x    $y   b$x    $y   c$x    $y   d$x    $y   e$x    $y
#   <int> <int> <int> <int> <int> <int> <int> <int> <int> <int>
# 1     5     2     0     5     2     2     2     3     0     3
# 2     0     3     0     1     4     1     2     1     2     2
# 3     0     2     3     2     1     5     2     1     0     4

Calculate euclidean distance between from from data to to data :

distances <- map2_df(
  df_ab,
  df_cd,
  ~ sqrt((.x$x - .y$x)^2 + (.x$y - .y$y)^2)
)

#distances
# # A tibble: 3 x 5
#       a     b     c     d     e
#   <dbl> <dbl> <dbl> <dbl> <dbl>
# 1  4.47  3.61  1.41  1     3.61
# 2  5.10  4     2     2.24  1.41
# 3  3.61  1     5     1.41  2   

Note that table above represent distances from each points for columns a...e, from first table, to corresponding points in second table

Plot distances for column a (to verify the approach, or just for fun):

sgms <- data.frame(
  x    = df_a$a,
  y    = df_b$a,
  xend = df_c$a,
  yend = df_d$a,
  l    = round(distances$a, 1)
  ) %>%
  mutate(lx = (x + xend) / 2, ly = (y + yend) / 2)

ggplot(data = sgms, aes(x = x, y = y, xend = xend, yend = yend)) +
  geom_segment(lty = 3, arrow = arrow(10, ,'closed', ends = 'last')) +
  geom_label(aes(x = lx, y = ly, label = l)) +
  geom_point(aes(x = x, y = y), pch = 21, size = 3.5) +
  geom_text(aes(x = x, y = y, label = sprintf('(%d, %d)', x, y)), vjust = 2) +
  geom_point(aes(x = xend, y = yend), pch = 22, size = 3.5) +
  geom_text(aes(x = xend, y = yend, label = sprintf('[%d, %d]', xend, yend)), vjust = -2) +
  expand_limits(y = c(-.5, 5.5), x = c(-.5, 5.5)) +
  ggtitle('Distances btw df_ab, df_cd; col. a') +
  ggthemes::theme_tufte()

区

Agree with @utubun, the use of dist is a problem in your example.

dist is helpful to calculate distance between elements in a matrix. For example:

R> m1 <- matrix(1:8, nrow=4)
R> m1
     [,1] [,2]
[1,]    1    5
[2,]    2    6
[3,]    3    7
[4,]    4    8

R> dist(m1)
         1        2        3
2 1.414214                  
3 2.828427 1.414214         
4 4.242641 2.828427 1.414214

Note that the euclidean distance between row [1,] and row [2,] is 1.4, which is similar to the distance between coordinates of (1,5) and (2,6) or sqrt(2) .

In your case, you don't need a matrix comparisons between all of your points - you sound most interested in distances between pairs of coordinates in two matrices.

As mentioned by @utubun, you need to have numeric values for your coordinates. For example, you could do:

mat1 <- matrix(apply(dfAB, 1:2, function(x) as.numeric(unlist(strsplit(x, ',')))), ncol = 2, byrow = T)
mat2 <- matrix(apply(dfCD, 1:2, function(x) as.numeric(unlist(strsplit(x, ',')))), ncol = 2, byrow = T)

And that would give you two numeric matrices with 2 columns each, which could be considered as your coordinates:

R> mat1[1:5,]
     [,1] [,2]
[1,]    1    1
[2,]    3    2
[3,]    4    4
[4,]    1    5
[5,]    0    4

R> mat2[1:5,]
     [,1] [,2]
[1,]    4    2
[2,]    3    2
[3,]    2    3
[4,]    4    0
[5,]    3    2

To get your distances, you could create a simple function to calculate euclidean distance:

euclidean_distance <- function(p, q){
  sqrt(sum((p - q)^2))
}

And then call the function row-wise through your two matrices of pairs of coordinates:

matrix(sapply(1:nrow(mat1), function(x) euclidean_distance(mat1[x,], mat2[x,])), ncol = 5, byrow = FALSE)

Which will give you your final matrix of distances:

          [,1]     [,2]     [,3]     [,4]     [,5]
 [1,] 3.162278 1.000000 4.472136 1.414214 1.414214
 [2,] 0.000000 0.000000 2.236068 1.000000 2.000000
 [3,] 2.236068 4.472136 5.385165 1.000000 1.000000
 [4,] 5.830952 2.236068 4.242641 3.605551 3.605551
 [5,] 3.605551 3.162278 1.000000 1.414214 2.000000
 [6,] 2.828427 2.000000 2.000000 2.000000 2.236068
 [7,] 1.414214 2.236068 2.236068 2.828427 1.414214
 [8,] 1.000000 4.000000 2.828427 2.000000 2.000000
 [9,] 3.000000 1.000000 1.000000 2.000000 1.000000
[10,] 2.236068 2.828427 4.123106 1.414214 1.000000

Data

set.seed(5)

dfA <- data.frame(x = rpois(10,2), y = rpois(10,2), z = rpois(10,2), q = rpois(10,2), t = rpois(10,2))
dfB <- data.frame(x = rpois(10,2), y = rpois(10,2), z = rpois(10,2), q = rpois(10,2), t = rpois(10,2))
dfAB <- map2_df(dfA, dfB, str_c, sep=",") %>%
  rename_all(~ str_c('C', seq_along(.)))

dfC <- data.frame(x = rpois(10,2), y = rpois(10,2), z = rpois(10,2), q = rpois(10,2), t = rpois(10,2))
dfD <- data.frame(x = rpois(10,2), y = rpois(10,2), z = rpois(10,2), q = rpois(10,2), t = rpois(10,2))
dfCD <- map2_df(dfC, dfD, str_c, sep=",") %>%
  rename_all(~ str_c('C', seq_along(.)))

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM