简体   繁体   English

找到R中所有其他点的最远点

[英]Find most distant point all other points in R

I'm having trouble finding a solution to this simple problem. 我无法找到这个简单问题的解决方案。 I have been searching the forums and altought I have gotten closer to an answer this is not exactly what I need. 我一直在搜索论坛,并且我已经接近答案,这不是我需要的。

I'm trying to find from a set of x,y points which point is the furthest away from any other points ie not the maximum distance between points, but the one furthest from the rest. 我试图从一组x,y点找到哪个点距离任何其他点最远,即不是点之间的最大距离,而是距离其余点最远的点。

I've tried 我试过了

x <-c(x1,x2,x3....)
y <-c(y1,y2,y3...)
dist(cbind(x,y))

Which gives me a matrix of the distance between each point to each point. 这给了我每个点到每个点之间距离的矩阵。 I can interrogate the data in MS Excel and find the answer. 我可以在MS Excel中查询数据并找到答案。 Find the minimum values in each column, then the maximum number across them. 找到每列中的最小值,然后找到它们之间的最大值。

在此输入图像描述

If I were to plot the data, I would like to have as output the distance of either the red or blue line (depending on which is longer). 如果我要绘制数据,我希望输出红色或蓝色线的距离(取决于哪个更长)。

在此输入图像描述

Starting from this example data set: 从此示例数据集开始:

set.seed(100)
x <- rnorm(150)
y <- rnorm(150)
coord <- cbind(x,y)
dobj <- dist(coord)

Now dobj is a distance object, but you can't examine that directly. 现在dobj是一个距离对象,但你不能直接检查它。 You'll have to convert that to a matrix first, and make sure you don't take zero distances between a point and itself into account: 您必须先将其转换为矩阵,并确保您不考虑点与其自身之间的零距离:

dmat <- as.matrix(dobj)
diag(dmat) <- NA

The latter line replaces the diagonal values in the distance matrix with NA . 后一行用NA替换距离矩阵中的对角线值。

Now you can use the solution of amonk: 现在你可以使用amonk的解决方案:

dmax <- max(apply(dmat,2,min,na.rm=TRUE))

This gives you the maximum distance to the nearest point. 这将为您提供到最近点的最大距离。 If you want to know which points these are, you can take an extra step : 如果您想知道这些是哪些点,您可以采取额外步骤:

which(dmat == dmax, arr.ind = TRUE)
#     row col
# 130 130  59
# 59   59 130

So point 130 and 59 are the two points fulfilling your conditions. 因此,第130和59点是满足您条件的两点。 Plotting this gives you: 绘制图表可以为您提供:

id <- which(dmat == dmax, arr.ind = TRUE) 
plot(coord)
lines(coord[id[1,],], col = 'red')

Note how you get this info twice, as euclidean distances between two points are symmetric (A -> B is as long as B -> A ). 请注意如何获得此信息两次,因为两点之间的欧几里德距离是对称的(A - > B与B - > A一样长)。

在此输入图像描述

So for df as your initial data frame you can perform the following: 因此,对于df作为初始数据框,您可以执行以下操作:

df<-NULL#initialize object 
for(i in 1:10)#create 10 vectors with 10 pseudorandom numbers each
  df<-cbind(df,runif(10))#fill the dataframe

cordf<-cor(df);diag(cordf)<-NA #create correlation matrix and set diagonal values to NA

Hence: 因此:

             [,1]        [,2]        [,3]        [,4]        [,5]        [,6]        [,7]        [,8]        [,9]       [,10]
[1,]          NA -0.03540916 -0.29183703  0.49358124  0.79846794  0.29490246  0.47661166 -0.51181482 -0.04116772 -0.10797632
[2,] -0.03540916          NA  0.47550478 -0.24284088 -0.01898357 -0.67102287 -0.46488410  0.01125144  0.13355919  0.08738474
[3,] -0.29183703  0.47550478          NA -0.05203104 -0.26311149  0.01120055 -0.16521411  0.49215496  0.40571893  0.30595246
[4,]  0.49358124 -0.24284088 -0.05203104          NA  0.60558581  0.53848638  0.80623397 -0.49950396 -0.01080598  0.41798727
[5,]  0.79846794 -0.01898357 -0.26311149  0.60558581          NA  0.33295170  0.53675545 -0.54756131  0.09225002 -0.01925587
[6,]  0.29490246 -0.67102287  0.01120055  0.53848638  0.33295170          NA  0.72936185  0.09463988  0.14607018  0.19487579
[7,]  0.47661166 -0.46488410 -0.16521411  0.80623397  0.53675545  0.72936185          NA -0.46348644 -0.05275132  0.47619940
[8,] -0.51181482  0.01125144  0.49215496 -0.49950396 -0.54756131  0.09463988 -0.46348644          NA  0.64924510  0.06783324
[9,] -0.04116772  0.13355919  0.40571893 -0.01080598  0.09225002  0.14607018 -0.05275132  0.64924510          NA  0.44698207
[10,] -0.10797632  0.08738474  0.30595246  0.41798727 -0.01925587  0.19487579  0.47619940  0.06783324  0.44698207          NA

Finally by executing: 最后执行:

   max(apply(cordf,2,min,na.rm=TRUE),na.rm = TRUE)#avoiding NA's 

one can get: 一个人可以得到:

[1] -0.05275132

the maximum value of the local minima. 局部最小值的最大值。

Edit: 编辑:

In order to get the index of matrix 为了得到矩阵的索引

>which(cordf==max(apply(cordf,2,min,na.rm=TRUE),na.rm = TRUE))
[1]68 77 

or in order to get the coordinates: 或者为了得到坐标:

> which(cordf==max(apply(cordf,2,min,na.rm=TRUE),na.rm = TRUE), arr.ind = TRUE)
     row col
[1,]   8   7
[2,]   7   8

It looks like to me, that you have spatial points in some projection. 在我看来,在某些投影中你有空间点。 One could argue, that the point furthest away from the rest, is the one which lies furthest from the center (the mean coordinates): 有人可能会争辩说,距离其余部分最远的点是离中心最远的点(平均坐标):

library(raster)

set.seed(21)

# create fake points
coords <- data.frame(x=sample(438000:443000,10),y=sample(6695000:6700000,10))

# calculate center
center <- matrix(colMeans(coords),ncol=2)

# red = center, magenta = furthest point (Nr.2)
plot(coords)

# furthest point #2
ix <- which.max(pointDistance(coords,center,lonlat = F))

points(center,col='red',pch='*',cex=3)
points(coords[ix,],col='magenta',pch='*',cex=3)

segments(coords[ix,1],coords[ix,2],center[1,1],center[1,2],col='magenta')

在此输入图像描述

To find the points farthest from the rest of the points you could do something like this. 要找到离其余点最远的点,你可以做这样的事情。 I opted for the median distance as you said the point(s) farthest from the rest of the data. 当你说距离其他数据最远的点时,我选择了中值距离。 If you have a group of points very close to each other the median should remain robust to this. 如果你有一组非常接近的点,那么中位数应该保持稳健。

There is probably also a way to do this with hierarchical clustering but it is escaping me at the moment. 可能还有一种方法可以通过层次聚类来实现这一目标,但目前它正在逃避我。

set.seed(1234)
mat <- rbind(matrix(rnorm(100), ncol=2), c(-5,5), c(-5.25,4.75))
d <- dist(mat)
sort(apply(as.matrix(d), 1, median), decreasing = T)[1:5]
# 51       52       20       12        4 
# 6.828322 6.797696 3.264315 2.806263 2.470919 

I wrote up a handy little function you can use for picking from the largest of line distances. 我写了一个方便的小功能,你可以用它从最大的线距离中挑选。 You can specify if you want the largest, second largest, and so forth with the n argument. 您可以使用n参数指定是否需要最大,第二大等等。

getBigSegment <- function(x, y, n = 1){
  a <- cbind(x,y)
  d <- as.matrix(dist(a, method = "euclidean"))
  sorted <- order(d, decreasing = T)
  sub <- (1:length(d))[as.logical(1:length(sorted) %% 2)]
  s <- which(d == d[sorted[sub][n]], arr.ind = T)
  t(cbind(a[s[1],], a[s[2],]))
}

With some example data similar to your own you can see: 通过一些类似于您自己的示例数据,您可以看到:

set.seed(100)
mydata <- data.frame(x = runif(10, 438000, 445000) + rpois(10, 440000), 
                     y = runif(10, 6695000, 6699000) + rpois(10, 6996000))

# The function
getBigSegment(mydata$x, mydata$y)
#            x        y
#[1,] 883552.8 13699108
#[2,] 881338.8 13688458    

Below you can visualize how I would use such a function 下面你可以看到我将如何使用这样的功能

# easy plotting function
pointsegments <- function(z, ...) {
  segments(z[1,1], z[1,2], z[2,1], z[2,2], ...)
  points(z, pch = 16, col = c("blue", "red"))

}

plot(mydata$x, mydata$y) # points
top3 <- lapply(1:3, getBigSegment, x = mydata$x, y = mydata$y) # top3 longest lines
mycolors <- c("black","blue","green") # 3 colors
for(i in 1:3) pointsegments(top3[[i]], col = mycolors[i]) # plot lines
legend("topleft", legend = round(unlist(lapply(top3, dist))), lty = 1,
       col = mycolors, text.col = mycolors, cex = .8) # legend

在此输入图像描述

This approach first uses chull to identify extreme_points , the points that lie on the boundary of the given points. 这种方法首先使用chull来识别extreme_points ,即位于给定点边界上的点。 Then, for each extreme_points , it calculates centroid of the extreme_points by excluding that particular extreme_points . 然后,对于每个extreme_points ,它通过排除该特定的extreme_points来计算extreme_pointscentroid Then it selects the point from extreme_points that's furthest away from the centroid . 然后它从距离centroid最远的extreme_points中选择点。

foo = function(X = all_points){
    plot(X)
    chull_inds = chull(X)
    extreme_points = X[chull_inds,]
    points(extreme_points, pch = 19, col = "red")
    centroid = t(sapply(1:NROW(extreme_points), function(i)
        c(mean(extreme_points[-i,1]), mean(extreme_points[-i,2]))))
    distances = sapply(1:NROW(extreme_points), function(i)
        dist(rbind(extreme_points[i,], centroid[i,])))
    points(extreme_points[which.max(distances),], pch = 18, cex = 2)
    points(X[chull_inds[which.max(distances)],], cex = 5)
    return(X[chull_inds[which.max(distances)],])
}

set.seed(42)
all_points = data.frame(x = rnorm(25), y = rnorm(25))
foo(X = all_points)
#           x         y
#18 -2.656455 0.7581632

在此输入图像描述

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 选择R中最远的n个点 - choose n most distant points in R 计算 DF 中的一个点与 R 中的所有其他点的距离 - Calculate distance of one point in DF with all other points in R 在R中的点数据集中选择n个最均匀的点差点 - Choose n most evenly spread points across point dataset in R 找到一个点的 xy 坐标,知道它与其他 2 个点的距离 - Find xy coordinates of a point knowing its distance to other 2 points 求R中数据点一定半径内的点数 - Find the Number of Points Within a Certain Radius of a Data Point in R 在R中有效地计算一个点和一组点之间的所有距离 - Calculating all distances between one point and a group of points efficiently in R 如何找到点到线的距离(知道所有点)? - How to find the distance of a point to a line (knowing all the points)? 通过 R 中的“最频繁”栅格化点 - Rasterize points by 'most frequent' in R 从 R 中的数据点提取一些数字的最有效方法是什么? (加上其他具体步骤!) - What is the most efficient way of extracting some numbers from a data point in R? (Plus other specific steps!) R - 如何围绕一个点绘制半径并使用该结果过滤其他点? - R - How do I draw a radius around a point and use that result to filter other points?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM