繁体   English   中英

在R中移除空间离群值(经纬坐标)

[英]Removing Spatial Outliers (lat and long coordinates) in R

我已经竭尽全力来阅读此书,并且我认为我找到了最合适的过程,但是如果其他人对此有任何想法或功能或不同的方法,将不胜感激。 因此,我有一个不同行长的小型数据帧的列表,每个数据帧在单独的列中包含多个纬度和经度坐标。 对于列表上的每个项目,我都需要删除一个可能离群的坐标对,然后找到其余坐标的平均中心(因此,列表上的每个项目最后都应该有一个坐标对。

我读过的方法是分别找到所有经纬度和经度的平均中心,然后计算从该平均中心到每个坐标对的欧几里得距离,并删除超出所需距离的点(假设100m)。 然后最后计算剩余点的平均中心作为最终结果。 但是,这对我来说似乎有些令人费解,因此,如果有人对坐标离群值移除有任何建议,那可能会更好。

这是我到目前为止的一些代码:

dfList <- structure(list(`43` = structure(list(date = c("43 2011-04-06", "43 2011-04-07", "43 2011-04-08"), identifier = c(43, 43, 43), lon = c(-117.23041303, -117.23040817, -117.23039471), lat = c(32.81217294, 32.81218158, 32.81218645)), .Names = c("date", "identifier", "lon", "lat"), row.names = 13:15, class = "data.frame"), `44` = structure(list(date = c("44 2011-04-06", "44 2011-04-07", "44 2011-04-08"), identifier = c(44, 44, 44), lon = c(-117.22864227, -117.22861559, -117.22862265), lat = c(32.81257756, 32.81257089, 32.81257197)), .Names = c("date", "identifier", "lon", "lat"), row.names = 19:21, class = "data.frame"), `46` = structure(list(date = c("46 2011-04-06", "46 2011-04-07", "46 2011-04-08", "46 2011-04-09", "46 2011-04-10", "46 2011-04-11"), identifier = c(46, 46, 46, 46, 46, 46), lon = c(-117.22992617, -117.2289396895, -117.22965116, -117.23003928, -117.229922602, -117.22969664), lat = c(32.81295118, 32.8128226975, 32.81317299, 32.81224457, 32.813018734, 32.81276993)), .Names = c("date", "identifier", "lon", "lat"), row.names = 25:30, class = "data.frame"), `47` = structure(list(date = c("47 2011-04-06", "47 2011-04-07"), identifier = c(47, 47), lon = c(-117.2274484, -117.22747116), lat = c(32.81205838, 32.81207607)), .Names = c("date", "identifier", "lon", "lat"), row.names = 31:32, class = "data.frame")), .Names = c("43", "44", "46", "47"))

lonMean <- lapply(dfList, function(x) mean(x$lon)) #taking mean for longs
latMean <- lapply(dfList, function(x) mean(x$lat)) #taking mean for lats
latLon <- mapply(c, lonMean, latMean, SIMPLIFY=FALSE)#combining coord lists into one

编辑: 所以我现在需要的是在第一个列表中的每个项目的所有坐标与第二个列表中的匹配平均坐标之间创建距离,并从第一个列表中删除所有距离大于100的点。以前使用过dist和geodist(来自'gmt')包,但是我不确定如何将它们与这两个列表一起使用。 然后进一步降低可能的异常值。 非常感谢您的事先帮助,我不是R方面最精明的人,因此非常感谢您的帮助!

尝试这个。

df <- do.call("rbind", dfList) # Flattens list into data frame, preserving 
                               # group identifier

# This function calculates distance in kilometers between two points
earth.dist <- function (long1, lat1, long2, lat2)
{
rad <- pi/180
a1 <- lat1 * rad
a2 <- long1 * rad
b1 <- lat2 * rad
b2 <- long2 * rad
dlon <- b2 - a2
dlat <- b1 - a1
a <- (sin(dlat/2))^2 + cos(a1) * cos(b1) * (sin(dlon/2))^2
c <- 2 * atan2(sqrt(a), sqrt(1 - a))
R <- 6378.145
d <- R * c
return(d)
}

df$dist <- earth.dist(df$lon, df$lat, mean(df$lon), mean(df$lat))

df[df$dist >= 0.1,] # Filter those above 100m

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM