简体   繁体   中英

spatial filtering by proximity in R

I have occurrence points for a species, and I'd like to remove potential sampling bias (where some regions might have much greater density of points than others). One way to do this would be to maximize a subset of points that are no less than a certain distance X of each other. Essentially, I would prevent points from being too close to each other.

Are there any existing R functions to do this? I've searched through various spatial packages, but haven't found anything, and can't figure out exactly how to implement this myself.

An example occurrence point dataset can be downloaded here .

Thanks!

Following Josh O'Brien's advice, I looked at spatstat's rMaternI function, and came up with the following. It seems to work pretty well.

The distance is in map units. It would be nice to incorporate one of R's distance functions that always returns distances in meters, rather than input units, but I couldn't figure that out...

require(spatstat)
require(maptools)
occ <- readShapeSpatial('occurrence_example.shp')

filterByProximity <- function(occ, dist) {
    pts <- as.ppp.SpatialPoints(occ)
    d <- nndist(pts)
    z <- which(d > dist)
    return(occ[z,])
}

occ2 <- filterByProximity(occ,dist=0.2)
plot(occ)
plot(occ2,add=T,col='blue',pch=20)

I've written a new version of this function that no longer really follows rMaternII. The input can either be a SpatialPoints, SpatialPointsDataFrame or matrix object.

Seems to work well, but suggestions welcome!

filterByProximity <- function(xy, dist, mapUnits = F) {
    #xy can be either a SpatialPoints or SPDF object, or a matrix
    #dist is in km if mapUnits=F, in mapUnits otherwise
    if (!mapUnits) {
        d <- spDists(xy,longlat=T)
    }
    if (mapUnits) {
        d <- spDists(xy,longlat=F)
    }
    diag(d) <- NA
    close <- (d <= dist)
    diag(close) <- NA
    closePts <- which(close,arr.ind=T)
    discard <- matrix(nrow=2,ncol=2)
    if (nrow(closePts) > 0) {
            while (nrow(closePts) > 0) {
                if ((!paste(closePts[1,1],closePts[1,2],sep='_') %in% paste(discard[,1],discard[,2],sep='_')) & (!paste(closePts[1,2],closePts[1,1],sep='_') %in% paste(discard[,1],discard[,2],sep='_'))) {
                discard <- rbind(discard, closePts[1,])
                closePts <- closePts[-union(which(closePts[,1] == closePts[1,1]), which(closePts[,2] == closePts[1,1])),]
                }
            }
        discard <- discard[complete.cases(discard),]
        return(xy[-discard[,1],])
    }
    if (nrow(closePts) == 0) {
        return(xy)
    }
}

Let's test it:

require(rgeos)
require(sp)
pts <- readWKT("MULTIPOINT ((3.5 2), (1 1), (2 2), (4.5 3), (4.5 4.5), (5 5), (1 5))")

pts2 <- filterByProximity(pts,dist=2, mapUnits=T)

plot(pts)
axis(1)
axis(2)
apply(as.data.frame(pts),1,function(x) plot(gBuffer(SpatialPoints(coords=matrix(c(x[1],x[2]),nrow=1)),width=2),add=T))
plot(pts2,add=T,col='blue',pch=20,cex=2)

在此处输入图片说明

There is also an R package called spThin that performs spatial thinning on point data. It was developed for reducing the effects of sampling bias for species distribution models, and does multiple iterations for optimization. The function is quite easy to implement---the vignette can be found here . There is also a paper in Ecography with details about the technique.

Rather than removing data points, you might consider spatial declustering . This involves giving points in clusters a lower weight than outlying points. The two simplest ways to do this involve a polygonal segmentation, like a Voronoi diagram, or some arbitrary grid. Both methods will weight points in each region according to the area of the region.

For example, if we take the points in your test (1,1),(2,2),(4.5,4.5),(5,5),(1,5) and apply a regular 2-by-2 mesh, where each cell is three units on a side, then the five points fall into three cells. The points ((1,1),(2,2)) falling into the cell [0,3]X[0,3] would each have weights 1/( no. of points in current cell TIMES tot. no. of occupied cells ) = 1 / ( 2 * 3 ). The same thing goes for the points ((4.5,4.5),(5,5)) in the cell (3,6]X(3,6] . The "outlier", (1,5) would have a weight 1 / ( 1 * 3 ). The nice thing about this technique is that it is a quick way to generate a density based weighting scheme.

A polygonal segmentation involves drawing a polygon around each point and using the area of that polygon to calculate the weight. Generally, the polygons completely cover the entire region, and the weights are calculated as the inverse of the area of each polygon. A Voronoi diagram is usually used for this, but polygonal segmentations may be calculated using other techniques, or may be specified by hand.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM