[英]Find most distant point all other points in R
I'm having trouble finding a solution to this simple problem. 我无法找到这个简单问题的解决方案。 I have been searching the forums and altought I have gotten closer to an answer this is not exactly what I need.
我一直在搜索论坛,并且我已经接近答案,这不是我需要的。
I'm trying to find from a set of x,y points which point is the furthest away from any other points ie not the maximum distance between points, but the one furthest from the rest. 我试图从一组x,y点找到哪个点距离任何其他点最远,即不是点之间的最大距离,而是距离其余点最远的点。
I've tried 我试过了
x <-c(x1,x2,x3....)
y <-c(y1,y2,y3...)
dist(cbind(x,y))
Which gives me a matrix of the distance between each point to each point. 这给了我每个点到每个点之间距离的矩阵。 I can interrogate the data in MS Excel and find the answer.
我可以在MS Excel中查询数据并找到答案。 Find the minimum values in each column, then the maximum number across them.
找到每列中的最小值,然后找到它们之间的最大值。
If I were to plot the data, I would like to have as output the distance of either the red or blue line (depending on which is longer). 如果我要绘制数据,我希望输出红色或蓝色线的距离(取决于哪个更长)。
Starting from this example data set: 从此示例数据集开始:
set.seed(100)
x <- rnorm(150)
y <- rnorm(150)
coord <- cbind(x,y)
dobj <- dist(coord)
Now dobj
is a distance object, but you can't examine that directly. 现在
dobj
是一个距离对象,但你不能直接检查它。 You'll have to convert that to a matrix first, and make sure you don't take zero distances between a point and itself into account: 您必须先将其转换为矩阵,并确保您不考虑点与其自身之间的零距离:
dmat <- as.matrix(dobj)
diag(dmat) <- NA
The latter line replaces the diagonal values in the distance matrix with NA
. 后一行用
NA
替换距离矩阵中的对角线值。
Now you can use the solution of amonk: 现在你可以使用amonk的解决方案:
dmax <- max(apply(dmat,2,min,na.rm=TRUE))
This gives you the maximum distance to the nearest point. 这将为您提供到最近点的最大距离。 If you want to know which points these are, you can take an extra step :
如果您想知道这些是哪些点,您可以采取额外步骤:
which(dmat == dmax, arr.ind = TRUE)
# row col
# 130 130 59
# 59 59 130
So point 130 and 59 are the two points fulfilling your conditions. 因此,第130和59点是满足您条件的两点。 Plotting this gives you:
绘制图表可以为您提供:
id <- which(dmat == dmax, arr.ind = TRUE)
plot(coord)
lines(coord[id[1,],], col = 'red')
Note how you get this info twice, as euclidean distances between two points are symmetric (A -> B is as long as B -> A ). 请注意如何获得此信息两次,因为两点之间的欧几里德距离是对称的(A - > B与B - > A一样长)。
So for df
as your initial data frame you can perform the following: 因此,对于
df
作为初始数据框,您可以执行以下操作:
df<-NULL#initialize object
for(i in 1:10)#create 10 vectors with 10 pseudorandom numbers each
df<-cbind(df,runif(10))#fill the dataframe
cordf<-cor(df);diag(cordf)<-NA #create correlation matrix and set diagonal values to NA
Hence: 因此:
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
[1,] NA -0.03540916 -0.29183703 0.49358124 0.79846794 0.29490246 0.47661166 -0.51181482 -0.04116772 -0.10797632
[2,] -0.03540916 NA 0.47550478 -0.24284088 -0.01898357 -0.67102287 -0.46488410 0.01125144 0.13355919 0.08738474
[3,] -0.29183703 0.47550478 NA -0.05203104 -0.26311149 0.01120055 -0.16521411 0.49215496 0.40571893 0.30595246
[4,] 0.49358124 -0.24284088 -0.05203104 NA 0.60558581 0.53848638 0.80623397 -0.49950396 -0.01080598 0.41798727
[5,] 0.79846794 -0.01898357 -0.26311149 0.60558581 NA 0.33295170 0.53675545 -0.54756131 0.09225002 -0.01925587
[6,] 0.29490246 -0.67102287 0.01120055 0.53848638 0.33295170 NA 0.72936185 0.09463988 0.14607018 0.19487579
[7,] 0.47661166 -0.46488410 -0.16521411 0.80623397 0.53675545 0.72936185 NA -0.46348644 -0.05275132 0.47619940
[8,] -0.51181482 0.01125144 0.49215496 -0.49950396 -0.54756131 0.09463988 -0.46348644 NA 0.64924510 0.06783324
[9,] -0.04116772 0.13355919 0.40571893 -0.01080598 0.09225002 0.14607018 -0.05275132 0.64924510 NA 0.44698207
[10,] -0.10797632 0.08738474 0.30595246 0.41798727 -0.01925587 0.19487579 0.47619940 0.06783324 0.44698207 NA
Finally by executing: 最后执行:
max(apply(cordf,2,min,na.rm=TRUE),na.rm = TRUE)#avoiding NA's
one can get: 一个人可以得到:
[1] -0.05275132
the maximum value of the local minima. 局部最小值的最大值。
In order to get the index of matrix 为了得到矩阵的索引
>which(cordf==max(apply(cordf,2,min,na.rm=TRUE),na.rm = TRUE))
[1]68 77
or in order to get the coordinates: 或者为了得到坐标:
> which(cordf==max(apply(cordf,2,min,na.rm=TRUE),na.rm = TRUE), arr.ind = TRUE)
row col
[1,] 8 7
[2,] 7 8
It looks like to me, that you have spatial points in some projection. 在我看来,在某些投影中你有空间点。 One could argue, that the point furthest away from the rest, is the one which lies furthest from the center (the mean coordinates):
有人可能会争辩说,距离其余部分最远的点是离中心最远的点(平均坐标):
library(raster)
set.seed(21)
# create fake points
coords <- data.frame(x=sample(438000:443000,10),y=sample(6695000:6700000,10))
# calculate center
center <- matrix(colMeans(coords),ncol=2)
# red = center, magenta = furthest point (Nr.2)
plot(coords)
# furthest point #2
ix <- which.max(pointDistance(coords,center,lonlat = F))
points(center,col='red',pch='*',cex=3)
points(coords[ix,],col='magenta',pch='*',cex=3)
segments(coords[ix,1],coords[ix,2],center[1,1],center[1,2],col='magenta')
To find the points farthest from the rest of the points you could do something like this. 要找到离其余点最远的点,你可以做这样的事情。 I opted for the median distance as you said the point(s) farthest from the rest of the data.
当你说距离其他数据最远的点时,我选择了中值距离。 If you have a group of points very close to each other the median should remain robust to this.
如果你有一组非常接近的点,那么中位数应该保持稳健。
There is probably also a way to do this with hierarchical clustering but it is escaping me at the moment. 可能还有一种方法可以通过层次聚类来实现这一目标,但目前它正在逃避我。
set.seed(1234)
mat <- rbind(matrix(rnorm(100), ncol=2), c(-5,5), c(-5.25,4.75))
d <- dist(mat)
sort(apply(as.matrix(d), 1, median), decreasing = T)[1:5]
# 51 52 20 12 4
# 6.828322 6.797696 3.264315 2.806263 2.470919
I wrote up a handy little function you can use for picking from the largest of line distances. 我写了一个方便的小功能,你可以用它从最大的线距离中挑选。 You can specify if you want the largest, second largest, and so forth with the
n
argument. 您可以使用
n
参数指定是否需要最大,第二大等等。
getBigSegment <- function(x, y, n = 1){
a <- cbind(x,y)
d <- as.matrix(dist(a, method = "euclidean"))
sorted <- order(d, decreasing = T)
sub <- (1:length(d))[as.logical(1:length(sorted) %% 2)]
s <- which(d == d[sorted[sub][n]], arr.ind = T)
t(cbind(a[s[1],], a[s[2],]))
}
With some example data similar to your own you can see: 通过一些类似于您自己的示例数据,您可以看到:
set.seed(100)
mydata <- data.frame(x = runif(10, 438000, 445000) + rpois(10, 440000),
y = runif(10, 6695000, 6699000) + rpois(10, 6996000))
# The function
getBigSegment(mydata$x, mydata$y)
# x y
#[1,] 883552.8 13699108
#[2,] 881338.8 13688458
Below you can visualize how I would use such a function 下面你可以看到我将如何使用这样的功能
# easy plotting function
pointsegments <- function(z, ...) {
segments(z[1,1], z[1,2], z[2,1], z[2,2], ...)
points(z, pch = 16, col = c("blue", "red"))
}
plot(mydata$x, mydata$y) # points
top3 <- lapply(1:3, getBigSegment, x = mydata$x, y = mydata$y) # top3 longest lines
mycolors <- c("black","blue","green") # 3 colors
for(i in 1:3) pointsegments(top3[[i]], col = mycolors[i]) # plot lines
legend("topleft", legend = round(unlist(lapply(top3, dist))), lty = 1,
col = mycolors, text.col = mycolors, cex = .8) # legend
This approach first uses chull
to identify extreme_points
, the points that lie on the boundary of the given points. 这种方法首先使用
chull
来识别extreme_points
,即位于给定点边界上的点。 Then, for each extreme_points
, it calculates centroid
of the extreme_points
by excluding that particular extreme_points
. 然后,对于每个
extreme_points
,它通过排除该特定的extreme_points
来计算extreme_points
的centroid
。 Then it selects the point from extreme_points
that's furthest away from the centroid
. 然后它从距离
centroid
最远的extreme_points
中选择点。
foo = function(X = all_points){
plot(X)
chull_inds = chull(X)
extreme_points = X[chull_inds,]
points(extreme_points, pch = 19, col = "red")
centroid = t(sapply(1:NROW(extreme_points), function(i)
c(mean(extreme_points[-i,1]), mean(extreme_points[-i,2]))))
distances = sapply(1:NROW(extreme_points), function(i)
dist(rbind(extreme_points[i,], centroid[i,])))
points(extreme_points[which.max(distances),], pch = 18, cex = 2)
points(X[chull_inds[which.max(distances)],], cex = 5)
return(X[chull_inds[which.max(distances)],])
}
set.seed(42)
all_points = data.frame(x = rnorm(25), y = rnorm(25))
foo(X = all_points)
# x y
#18 -2.656455 0.7581632
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.