在R中移除空间离群值（经纬坐标）

Question

我已经竭尽全力来阅读此书，并且我认为我找到了最合适的过程，但是如果其他人对此有任何想法或功能或不同的方法，将不胜感激。 因此，我有一个不同行长的小型数据帧的列表，每个数据帧在单独的列中包含多个纬度和经度坐标。 对于列表上的每个项目，我都需要删除一个可能离群的坐标对，然后找到其余坐标的平均中心（因此，列表上的每个项目最后都应该有一个坐标对。

我读过的方法是分别找到所有经纬度和经度的平均中心，然后计算从该平均中心到每个坐标对的欧几里得距离，并删除超出所需距离的点（假设100m）。 然后最后计算剩余点的平均中心作为最终结果。 但是，这对我来说似乎有些令人费解，因此，如果有人对坐标离群值移除有任何建议，那可能会更好。

这是我到目前为止的一些代码：

dfList <- structure(list(`43` = structure(list(date = c("43 2011-04-06", "43 2011-04-07", "43 2011-04-08"), identifier = c(43, 43, 43), lon = c(-117.23041303, -117.23040817, -117.23039471), lat = c(32.81217294, 32.81218158, 32.81218645)), .Names = c("date", "identifier", "lon", "lat"), row.names = 13:15, class = "data.frame"), `44` = structure(list(date = c("44 2011-04-06", "44 2011-04-07", "44 2011-04-08"), identifier = c(44, 44, 44), lon = c(-117.22864227, -117.22861559, -117.22862265), lat = c(32.81257756, 32.81257089, 32.81257197)), .Names = c("date", "identifier", "lon", "lat"), row.names = 19:21, class = "data.frame"), `46` = structure(list(date = c("46 2011-04-06", "46 2011-04-07", "46 2011-04-08", "46 2011-04-09", "46 2011-04-10", "46 2011-04-11"), identifier = c(46, 46, 46, 46, 46, 46), lon = c(-117.22992617, -117.2289396895, -117.22965116, -117.23003928, -117.229922602, -117.22969664), lat = c(32.81295118, 32.8128226975, 32.81317299, 32.81224457, 32.813018734, 32.81276993)), .Names = c("date", "identifier", "lon", "lat"), row.names = 25:30, class = "data.frame"), `47` = structure(list(date = c("47 2011-04-06", "47 2011-04-07"), identifier = c(47, 47), lon = c(-117.2274484, -117.22747116), lat = c(32.81205838, 32.81207607)), .Names = c("date", "identifier", "lon", "lat"), row.names = 31:32, class = "data.frame")), .Names = c("43", "44", "46", "47"))

lonMean <- lapply(dfList, function(x) mean(x$lon)) #taking mean for longs
latMean <- lapply(dfList, function(x) mean(x$lat)) #taking mean for lats
latLon <- mapply(c, lonMean, latMean, SIMPLIFY=FALSE)#combining coord lists into one

编辑：所以我现在需要的是在第一个列表中的每个项目的所有坐标与第二个列表中的匹配平均坐标之间创建距离，并从第一个列表中删除所有距离大于100的点。以前使用过dist和geodist（来自'gmt'）包，但是我不确定如何将它们与这两个列表一起使用。 然后进一步降低可能的异常值。 非常感谢您的事先帮助，我不是R方面最精明的人，因此非常感谢您的帮助！

Answer 1

尝试这个。

df <- do.call("rbind", dfList) # Flattens list into data frame, preserving 
                               # group identifier

# This function calculates distance in kilometers between two points
earth.dist <- function (long1, lat1, long2, lat2)
{
rad <- pi/180
a1 <- lat1 * rad
a2 <- long1 * rad
b1 <- lat2 * rad
b2 <- long2 * rad
dlon <- b2 - a2
dlat <- b1 - a1
a <- (sin(dlat/2))^2 + cos(a1) * cos(b1) * (sin(dlon/2))^2
c <- 2 * atan2(sqrt(a), sqrt(1 - a))
R <- 6378.145
d <- R * c
return(d)
}

df$dist <- earth.dist(df$lon, df$lat, mean(df$lon), mean(df$lat))

df[df$dist >= 0.1,] # Filter those above 100m

在R中移除空间离群值（经纬坐标）

问题描述

1 个解决方案

解决方案1
3 已采纳 2014-06-26 21:27:28

在R中移除空间离群值（经纬坐标）

问题描述

1 个解决方案

解决方案1 3 已采纳 2014-06-26 21:27:28

解决方案1
3 已采纳 2014-06-26 21:27:28