Find most distant point all other points in R

Question

I'm having trouble finding a solution to this simple problem. I have been searching the forums and altought I have gotten closer to an answer this is not exactly what I need.

I'm trying to find from a set of x,y points which point is the furthest away from any other points ie not the maximum distance between points, but the one furthest from the rest.

I've tried

x <-c(x1,x2,x3....)
y <-c(y1,y2,y3...)
dist(cbind(x,y))

Which gives me a matrix of the distance between each point to each point. I can interrogate the data in MS Excel and find the answer. Find the minimum values in each column, then the maximum number across them.

If I were to plot the data, I would like to have as output the distance of either the red or blue line (depending on which is longer).

Answer 1

Starting from this example data set:

set.seed(100)
x <- rnorm(150)
y <- rnorm(150)
coord <- cbind(x,y)
dobj <- dist(coord)

Now dobj is a distance object, but you can't examine that directly. You'll have to convert that to a matrix first, and make sure you don't take zero distances between a point and itself into account:

dmat <- as.matrix(dobj)
diag(dmat) <- NA

The latter line replaces the diagonal values in the distance matrix with NA .

Now you can use the solution of amonk:

dmax <- max(apply(dmat,2,min,na.rm=TRUE))

This gives you the maximum distance to the nearest point. If you want to know which points these are, you can take an extra step :

which(dmat == dmax, arr.ind = TRUE)
#     row col
# 130 130  59
# 59   59 130

So point 130 and 59 are the two points fulfilling your conditions. Plotting this gives you:

id <- which(dmat == dmax, arr.ind = TRUE) 
plot(coord)
lines(coord[id[1,],], col = 'red')

Note how you get this info twice, as euclidean distances between two points are symmetric (A -> B is as long as B -> A ).

Answer 2

So for df as your initial data frame you can perform the following:

df<-NULL#initialize object 
for(i in 1:10)#create 10 vectors with 10 pseudorandom numbers each
  df<-cbind(df,runif(10))#fill the dataframe

cordf<-cor(df);diag(cordf)<-NA #create correlation matrix and set diagonal values to NA

Hence:

             [,1]        [,2]        [,3]        [,4]        [,5]        [,6]        [,7]        [,8]        [,9]       [,10]
[1,]          NA -0.03540916 -0.29183703  0.49358124  0.79846794  0.29490246  0.47661166 -0.51181482 -0.04116772 -0.10797632
[2,] -0.03540916          NA  0.47550478 -0.24284088 -0.01898357 -0.67102287 -0.46488410  0.01125144  0.13355919  0.08738474
[3,] -0.29183703  0.47550478          NA -0.05203104 -0.26311149  0.01120055 -0.16521411  0.49215496  0.40571893  0.30595246
[4,]  0.49358124 -0.24284088 -0.05203104          NA  0.60558581  0.53848638  0.80623397 -0.49950396 -0.01080598  0.41798727
[5,]  0.79846794 -0.01898357 -0.26311149  0.60558581          NA  0.33295170  0.53675545 -0.54756131  0.09225002 -0.01925587
[6,]  0.29490246 -0.67102287  0.01120055  0.53848638  0.33295170          NA  0.72936185  0.09463988  0.14607018  0.19487579
[7,]  0.47661166 -0.46488410 -0.16521411  0.80623397  0.53675545  0.72936185          NA -0.46348644 -0.05275132  0.47619940
[8,] -0.51181482  0.01125144  0.49215496 -0.49950396 -0.54756131  0.09463988 -0.46348644          NA  0.64924510  0.06783324
[9,] -0.04116772  0.13355919  0.40571893 -0.01080598  0.09225002  0.14607018 -0.05275132  0.64924510          NA  0.44698207
[10,] -0.10797632  0.08738474  0.30595246  0.41798727 -0.01925587  0.19487579  0.47619940  0.06783324  0.44698207          NA

Finally by executing:

   max(apply(cordf,2,min,na.rm=TRUE),na.rm = TRUE)#avoiding NA's

one can get:

[1] -0.05275132

the maximum value of the local minima.

Edit:

In order to get the index of matrix

>which(cordf==max(apply(cordf,2,min,na.rm=TRUE),na.rm = TRUE))
[1]68 77

or in order to get the coordinates:

> which(cordf==max(apply(cordf,2,min,na.rm=TRUE),na.rm = TRUE), arr.ind = TRUE)
     row col
[1,]   8   7
[2,]   7   8

Answer 3

It looks like to me, that you have spatial points in some projection. One could argue, that the point furthest away from the rest, is the one which lies furthest from the center (the mean coordinates):

library(raster)

set.seed(21)

# create fake points
coords <- data.frame(x=sample(438000:443000,10),y=sample(6695000:6700000,10))

# calculate center
center <- matrix(colMeans(coords),ncol=2)

# red = center, magenta = furthest point (Nr.2)
plot(coords)

# furthest point #2
ix <- which.max(pointDistance(coords,center,lonlat = F))

points(center,col='red',pch='*',cex=3)
points(coords[ix,],col='magenta',pch='*',cex=3)

segments(coords[ix,1],coords[ix,2],center[1,1],center[1,2],col='magenta')

Answer 4

To find the points farthest from the rest of the points you could do something like this. I opted for the median distance as you said the point(s) farthest from the rest of the data. If you have a group of points very close to each other the median should remain robust to this.

There is probably also a way to do this with hierarchical clustering but it is escaping me at the moment.

set.seed(1234)
mat <- rbind(matrix(rnorm(100), ncol=2), c(-5,5), c(-5.25,4.75))
d <- dist(mat)
sort(apply(as.matrix(d), 1, median), decreasing = T)[1:5]
# 51       52       20       12        4 
# 6.828322 6.797696 3.264315 2.806263 2.470919

Answer 5

I wrote up a handy little function you can use for picking from the largest of line distances. You can specify if you want the largest, second largest, and so forth with the n argument.

getBigSegment <- function(x, y, n = 1){
  a <- cbind(x,y)
  d <- as.matrix(dist(a, method = "euclidean"))
  sorted <- order(d, decreasing = T)
  sub <- (1:length(d))[as.logical(1:length(sorted) %% 2)]
  s <- which(d == d[sorted[sub][n]], arr.ind = T)
  t(cbind(a[s[1],], a[s[2],]))
}

With some example data similar to your own you can see:

set.seed(100)
mydata <- data.frame(x = runif(10, 438000, 445000) + rpois(10, 440000), 
                     y = runif(10, 6695000, 6699000) + rpois(10, 6996000))

# The function
getBigSegment(mydata$x, mydata$y)
#            x        y
#[1,] 883552.8 13699108
#[2,] 881338.8 13688458

Below you can visualize how I would use such a function

# easy plotting function
pointsegments <- function(z, ...) {
  segments(z[1,1], z[1,2], z[2,1], z[2,2], ...)
  points(z, pch = 16, col = c("blue", "red"))

}

plot(mydata$x, mydata$y) # points
top3 <- lapply(1:3, getBigSegment, x = mydata$x, y = mydata$y) # top3 longest lines
mycolors <- c("black","blue","green") # 3 colors
for(i in 1:3) pointsegments(top3[[i]], col = mycolors[i]) # plot lines
legend("topleft", legend = round(unlist(lapply(top3, dist))), lty = 1,
       col = mycolors, text.col = mycolors, cex = .8) # legend

Answer 6

This approach first uses chull to identify extreme_points , the points that lie on the boundary of the given points. Then, for each extreme_points , it calculates centroid of the extreme_points by excluding that particular extreme_points . Then it selects the point from extreme_points that's furthest away from the centroid .

foo = function(X = all_points){
    plot(X)
    chull_inds = chull(X)
    extreme_points = X[chull_inds,]
    points(extreme_points, pch = 19, col = "red")
    centroid = t(sapply(1:NROW(extreme_points), function(i)
        c(mean(extreme_points[-i,1]), mean(extreme_points[-i,2]))))
    distances = sapply(1:NROW(extreme_points), function(i)
        dist(rbind(extreme_points[i,], centroid[i,])))
    points(extreme_points[which.max(distances),], pch = 18, cex = 2)
    points(X[chull_inds[which.max(distances)],], cex = 5)
    return(X[chull_inds[which.max(distances)],])
}

set.seed(42)
all_points = data.frame(x = rnorm(25), y = rnorm(25))
foo(X = all_points)
#           x         y
#18 -2.656455 0.7581632

Find most distant point all other points in R

Question

6 answers

solution1
3 ACCPTED 2017-06-28 13:48:57

solution2
1 2017-06-28 13:32:09

Edit:

solution3
1 2017-06-28 13:36:07

solution4
1 2017-06-28 14:00:45

solution5
1 2017-06-28 14:35:02

solution6
1 2017-06-28 14:55:02

Find most distant point all other points in R

Question

6 answers

solution1 3 ACCPTED 2017-06-28 13:48:57

solution2 1 2017-06-28 13:32:09

Edit:

solution3 1 2017-06-28 13:36:07

solution4 1 2017-06-28 14:00:45

solution5 1 2017-06-28 14:35:02

solution6 1 2017-06-28 14:55:02

solution1
3 ACCPTED 2017-06-28 13:48:57

solution2
1 2017-06-28 13:32:09

solution3
1 2017-06-28 13:36:07

solution4
1 2017-06-28 14:00:45

solution5
1 2017-06-28 14:35:02

solution6
1 2017-06-28 14:55:02