简体   繁体   English

在R中的点数据集中选择n个最均匀的点差点

[英]Choose n most evenly spread points across point dataset in R

Given a set of points, I am trying to select a subset of n points that are most evenly distributed across this set of points. 给定一组点,我试图选择在这组点上均匀分布的n个点的子集。 In other words, I am trying to thin out the dataset while still evenly sampling across space. 换句话说,我正在尝试稀释数据集,同时仍然在空间中均匀采样。

So far, I have the following, but this approach likely won't do well with larger datasets. 到目前为止,我有以下内容,但这种方法可能不适用于较大的数据集。 Maybe there is a more intelligent way to choose the subset of points in the first place... The following code randomly chooses a subset of the points, and seeks to minimize the distance between the points within this subset and the points outside of this subset. 也许有一种更智能的方法来首先选择点的子集......以下代码随机选择点的子集,并寻求最小化该子集内的点与该子集外的点之间的距离。

Suggestions appreciated! 建议赞赏!

evenSubset <- function(xy, n) {

    bestdist <- NA
    bestSet <- NA
    alldist <- as.matrix(dist(xy))
    diag(alldist) <- NA
    alldist[upper.tri(alldist)] <- NA
    for (i in 1:1000){
        subset <- sample(1:nrow(xy),n)
        subdists <- alldist[subset,-subset]
        distsum <- sum(subdists,na.rm=T)
        if (distsum < bestdist | is.na(bestdist)) {
            bestdist <- distsum
            bestSet <- subset
        }
    }
    return(xy[bestSet,])
}

xy2 <- evenSubset(xy=cbind(rnorm(1000),rnorm(1000)), n=20)
plot(xy)
points(xy2,col='blue',cex=1.5,pch=20)

Following @Spacedman's suggestion, I used voronoi tesselation to identify and drop those points that were closest to other points. 按照@Spasdman的建议,我使用voronoi tesselation来识别和删除那些最接近其他点的点。

Here, the percentage of points to drop is given to the function. 这里,下降点的百分比给予函数。 This appears to work quite well, except for the fact that it is slow with large datasets. 这看起来效果很好,除了大数据集速度慢的事实。

library(tripack)
voronoiFilter <- function(occ,drop) {
    n <- round(x=(nrow(occ) * drop),digits=0)
    subset <- occ
    dropped <- vector()
    for (i in 1:n) {
        v <- voronoi.mosaic(x=subset[,'Longitude'],y=subset[,'Latitude'],duplicate='error')
        info <- cells(v)
        areas <- unlist(lapply(info,function(x) x$area))
        smallest <- which(areas == min(areas,na.rm=TRUE))
        dropped <- c(dropped,which(paste(occ[,'Longitude'],occ[,'Latitude'],sep='_') == paste(subset[smallest,'Longitude'],subset[smallest,'Latitude'],sep='_')))
        subset <- subset[-smallest,]
    }
    return(occ[-dropped,])
}

xy <- cbind(rnorm(500),rnorm(500))
colnames(xy) <- c('Longitude','Latitude')
xy2 <- voronoiFilter(xy, drop=0.7)

plot(xy)
points(xy2,col='blue',cex=1.5,pch=20)

在此输入图像描述

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM