unique pairwise distances between any points in the dataframe

Question

I have a list of ten points with X and Ỳ coordinates. I would like to calculate the possible permutations of distances between any two points. Precisely, only one of the distances in 1-2, 2-1 should be present. I have managed to remove the distances of a point with itself. But couldn't achieve this permutation distances.

# Data Generation
df <- data.frame(X = runif(10, 0, 1), Y = runif(10, 0, 1), ID = 1:10)

# Temporary key Creation
df <- df %>% mutate(key = 1) 

# Calculating pairwise distances
df %>% full_join(df, by = "key") %>% 
  mutate(dist = sqrt((X.x - X.y)^2 + (Y.x - Y.y)^2)) %>% 
  select(ID.x, ID.y, dist) %>% filter(!dist == 0) %>% head(11)

# Output 
#    ID.x ID.y       dist
# 1     1    2 0.90858911
# 2     1    3 0.71154587
# 3     1    4 0.05687495
# 4     1    5 1.03885510
# 5     1    6 0.93747717
# 6     1    7 0.62070415
# 7     1    8 0.88351690
# 8     1    9 0.89651911
# 9     1   10 0.05079906
# 10    2    1 0.90858911
# 11    2    3 0.27530175

How to achieve the expected output shown below?

# Expected Output 
#    ID.x ID.y       dist
# 1     1    2 0.90858911
# 2     1    3 0.71154587
# 3     1    4 0.05687495
# 4     1    5 1.03885510
# 5     1    6 0.93747717
# 6     1    7 0.62070415
# 7     1    8 0.88351690
# 8     1    9 0.89651911
# 9     1   10 0.05079906
# 10    2    3 0.27530175
# 11    2    4 0.5415415

But this approach is computationally slower compared to dist() . Would be happier to listen to faster approaches.

Answer 1

I would use dist on the data and then process the output into the required format. You can replace dist with any other distance function. Here I've used letters rather than numbers as ID to better show what is happening

set.seed(42)
df <- data.frame(X = runif(10, 0, 1), Y = runif(10, 0, 1), ID = letters[1:10])

df %>% 
  column_to_rownames("ID") %>% #make the ID the rownames. dist will use these> NB will not work on a tibble
  dist() %>% 
  as.matrix() %>% 
  as.data.frame() %>% 
  rownames_to_column(var = "ID.x") %>% #capture the row IDs
  gather(key = ID.y, value = dist, -ID.x) %>% 
  filter(ID.x < ID.y) %>% 
  as_tibble()

   # A tibble: 45 x 3
    ID.x  ID.y      dist
   <chr> <chr>     <dbl>
 1     a     b 0.2623175
 2     a     c 0.7891034
 3     b     c 0.6856994
 4     a     d 0.2191960
 5     b     d 0.4757855
 6     c     d 0.8704269
 7     a     e 0.2730984
 8     b     e 0.3913770
 9     c     e 0.5912681
10     d     e 0.2800021
# ... with 35 more rows

dist is very fast compared with looping through calculating distances. The code can probably be made more efficient, by working directly of the dist object rather than converting it into a matrix.

Answer 2

Perhaps this is a slightly simpler approach:

df <- data.frame(X = runif(10, 0, 1), Y = runif(10, 0, 1), ID = 1:10)

df2 <- data.frame(ID1 = rep(1:10, each = 10),
                  ID2 = 1:10,
                  distance = as.vector(as.matrix((dist(df)))))

Then get rid of diagonal:

df2 <- df2[df2$ID1 != df2$ID2,]

Get rid of upper triangle:

df2 <- df2[df2$ID1 < df2$ID2,]
df2
ID1 ID2 distance
2    1   2 1.000615
3    1   3 2.057813
4    1   4 3.010261
5    1   5 4.039502
6    1   6 5.029982
7    1   7 6.035427
8    1   8 7.012540
9    1   9 8.006249
10   1  10 9.015352
13   2   3 1.099245
14   2   4 2.011664
...

unique pairwise distances between any points in the dataframe

Question

2 answers

solution1
2 ACCPTED 2017-08-10 12:27:10

solution2
1 2017-08-10 12:44:13

unique pairwise distances between any points in the dataframe

Question

2 answers

solution1 2 ACCPTED 2017-08-10 12:27:10

solution2 1 2017-08-10 12:44:13

solution1
2 ACCPTED 2017-08-10 12:27:10

solution2
1 2017-08-10 12:44:13