繁体   English   中英

给定纬度和经度/坐标,如何自动确定状态?

[英]How do I automatically determine a state given the latitude and longitude / coordinates?

我有一个~17,000 lat / lon值的数据框,我希望用它来获取和填充具有等效状态的新列。

到目前为止,我已经尝试了其他Stack Overflow答案中提出的几个解决方案(这里列出的太多,但超过十个),但没有一个对我有用。

我最接近找到一个解决方案是使用ggmap包,但问题是我被警告我已超出限制,尽管只发送一个lat / lon值。

我有单独的latlon值,甚至将它们组合成lat,lon格式,尽管如此,上述解决方案都不适用于我。

我想要做的是从给定的lat/lon / coord值确定状态并将状态保存在新列( df$state )中。

我最初匹配所有城市值以获得匹配状态,但问题在于,由于多个州包含具有相同名称的城市,因此匹配过程在第一次成功匹配后停止; 结果,我发现自己有超过2,800个属于AK的城市,尽管它们几千英里之外。

任何建议都会很棒。

以下是我的数据的前100行coordslatlon列:

structure(list(origin_coords = c("31.9618,-83.0588", "44.8782,-69.4718", 
"37.3894,-121.8868", "36.0485,-93.5044", "37.652,-120.7292", 
"33.7942,-84.2018", "32.0749,-81.0883", "31.0286,-97.6115", "40.7559,-111.8967", 
"39.8359,-91.7538", "35.922,-80.537", "39.8036,-75.0058", "43.072,-83.8424", 
"33.5207,-86.8025", "26.1216,-80.1288", "31.9618,-83.0588", "31.9618,-83.0588", 
"61.6303,-149.8181", "33.8687,-84.3351", "42.2196,-88.2426", 
"31.7943,-85.5581", "28.3067,-80.6862", "39.1157,-94.6271", "33.831,-85.7752", 
"39.2655,-76.4935", "32.9824,-87.7919", "61.6303,-149.8181", 
"31.086,-85.7192", "31.9618,-83.0588", "39.9048,-75.2946", "34.1132,-117.3771", 
"41.905,-71.1026", "42.3921,-97.4751", "31.2627,-86.7711", "42.5864,-71.4401", 
"33.7935,-93.807", "39.0097,-123.6523", "61.6303,-149.8181", 
"37.7235,-85.9769", "38.0624,-87.2452", "37.7166,-121.9226", 
"42.9993,-88.2196", "40.6316,-74.0927", "43.0892,-77.436", "39.8359,-91.7538", 
"38.5487,-89.5413", "35.833,-90.6965", "41.363,-89.0008", "37.7953,-95.9368", 
"33.4581,-83.0802", "33.7546,-93.6735", "32.7491,-96.4598", "41.8858,-87.6181", 
"40.7328,-74.0755", "31.2627,-86.7711", "31.9618,-83.0588", "61.6303,-149.8181", 
"38.4642,-85.7775", "40.6344,-92.9219", "37.8366,-89.1424", "42.5648,-83.0701", 
"39.5394,-76.3564", "33.8687,-84.3351", "41.4564,-90.7235", "42.0122,-87.8417", 
"38.8339,-104.8214", "36.4442,-92.5832", "39.838,-104.9988", 
"41.8378,-87.7602", "28.3051,-81.4242", "41.6052,-71.9808", "40.7808,-80.0592", 
"40.5364,-89.1885", "31.9618,-83.0588", "40.8915,-74.0119", "43.2078,-91.2976", 
"34.4574,-83.476", "36.4105,-92.1951", "40.0177,-75.2594", "36.0557,-96.0602", 
"44.694,-85.6763", "61.6303,-149.8181", "40.7446,-73.9345", "29.1989,-82.0874", 
"26.6048,-80.2149", "34.6909,-118.1491", "39.0289,-95.2086", 
"35.4074,-93.1355", "36.2523,-92.6907", "45.2097,-123.2043", 
"37.7953,-95.9368", "61.6303,-149.8181", "39.1157,-94.6271", 
"33.5793,-86.6375", "40.3757,-86.3201", "40.6344,-92.9219", "39.8359,-91.7538", 
"42.3921,-97.4751", "41.2564,-73.2111", "44.2767,-121.1896"), 
    origin_lat = c(31.9618, 44.8782, 37.3894, 36.0485, 37.652, 
    33.7942, 32.0749, 31.0286, 40.7559, 39.8359, 35.922, 39.8036, 
    43.072, 33.5207, 26.1216, 31.9618, 31.9618, 61.6303, 33.8687, 
    42.2196, 31.7943, 28.3067, 39.1157, 33.831, 39.2655, 32.9824, 
    61.6303, 31.086, 31.9618, 39.9048, 34.1132, 41.905, 42.3921, 
    31.2627, 42.5864, 33.7935, 39.0097, 61.6303, 37.7235, 38.0624, 
    37.7166, 42.9993, 40.6316, 43.0892, 39.8359, 38.5487, 35.833, 
    41.363, 37.7953, 33.4581, 33.7546, 32.7491, 41.8858, 40.7328, 
    31.2627, 31.9618, 61.6303, 38.4642, 40.6344, 37.8366, 42.5648, 
    39.5394, 33.8687, 41.4564, 42.0122, 38.8339, 36.4442, 39.838, 
    41.8378, 28.3051, 41.6052, 40.7808, 40.5364, 31.9618, 40.8915, 
    43.2078, 34.4574, 36.4105, 40.0177, 36.0557, 44.694, 61.6303, 
    40.7446, 29.1989, 26.6048, 34.6909, 39.0289, 35.4074, 36.2523, 
    45.2097, 37.7953, 61.6303, 39.1157, 33.5793, 40.3757, 40.6344, 
    39.8359, 42.3921, 41.2564, 44.2767), origin_lon = c(-83.0588, 
    -69.4718, -121.8868, -93.5044, -120.7292, -84.2018, -81.0883, 
    -97.6115, -111.8967, -91.7538, -80.537, -75.0058, -83.8424, 
    -86.8025, -80.1288, -83.0588, -83.0588, -149.8181, -84.3351, 
    -88.2426, -85.5581, -80.6862, -94.6271, -85.7752, -76.4935, 
    -87.7919, -149.8181, -85.7192, -83.0588, -75.2946, -117.3771, 
    -71.1026, -97.4751, -86.7711, -71.4401, -93.807, -123.6523, 
    -149.8181, -85.9769, -87.2452, -121.9226, -88.2196, -74.0927, 
    -77.436, -91.7538, -89.5413, -90.6965, -89.0008, -95.9368, 
    -83.0802, -93.6735, -96.4598, -87.6181, -74.0755, -86.7711, 
    -83.0588, -149.8181, -85.7775, -92.9219, -89.1424, -83.0701, 
    -76.3564, -84.3351, -90.7235, -87.8417, -104.8214, -92.5832, 
    -104.9988, -87.7602, -81.4242, -71.9808, -80.0592, -89.1885, 
    -83.0588, -74.0119, -91.2976, -83.476, -92.1951, -75.2594, 
    -96.0602, -85.6763, -149.8181, -73.9345, -82.0874, -80.2149, 
    -118.1491, -95.2086, -93.1355, -92.6907, -123.2043, -95.9368, 
    -149.8181, -94.6271, -86.6375, -86.3201, -92.9219, -91.7538, 
    -97.4751, -73.2111, -121.1896)), row.names = c(NA, 100L), class = "data.frame")

使用功能oversp包:

library(geojsonio)
library(sp)

# get usa polygon data
# http://eric.clst.org/tech/usgeojson/
usa <- geojson_read(
  "http://eric.clst.org/assets/wiki/uploads/Stuff/gz_2010_us_040_00_500k.json", 
  what = "sp"
)

df$state <- NA

# compare points
for (i in 1:nrow(df)) {
  coords <- c(df$origin_lon[i], df$origin_lat[i])
  if(any(is.na(coords))) next
  point <- sp::SpatialPoints(
    matrix(
      coords,
      nrow = 1
    )
  )
  sp::proj4string(point) <- sp::proj4string(usa)
  polygon_check <- sp::over(point, usa)
  df$state[i] <- as.character(polygon_check$NAME)
}

> head(df)
origin_coords origin_lat origin_lon      state
1  31.9618,-83.0588    31.9618   -83.0588    Georgia
2  44.8782,-69.4718    44.8782   -69.4718      Maine
3 37.3894,-121.8868    37.3894  -121.8868 California
4  36.0485,-93.5044    36.0485   -93.5044   Arkansas
5  37.652,-120.7292    37.6520  -120.7292 California
6  33.7942,-84.2018    33.7942   -84.2018    Georgia

这是一个使用具有美国状态states_sf (从USAboundaties包创建)的空间对象的空间连接st_join的sf解决方案,以及包含数据点points_sf的空间对象。

请验证结果,因为我对R中的空间工作很陌生。

只需过滤所需列的结果data.frame即可。

library(sf)
library(USAboundaries)

states_sf <- st_transform( us_states( map_date = NULL, resolution = c("low", "high"), states = NULL), 4326)
points_sf = st_as_sf( points, coords = c("origin_lon", "origin_lat"), crs = 4326, agr = "constant")
result <- as.data.frame( st_join(points_sf, states_sf, join = st_intersects) )


# > head(result)
#       origin_coords statefp  statens    affgeoid geoid stusps       name lsad        aland      awater state_name state_abbr jurisdiction_type                  geometry
# 1  31.9618,-83.0588      13 01705317 0400000US13    13     GA    Georgia   00 149169848456  4741100880    Georgia         GA             state  POINT (-83.0588 31.9618)
# 2  44.8782,-69.4718      23 01779787 0400000US23    23     ME      Maine   00  79885221885 11748755195      Maine         ME             state  POINT (-69.4718 44.8782)
# 3 37.3894,-121.8868      06 01779778 0400000US06    06     CA California   00 403501101370 20466718403 California         CA             state POINT (-121.8868 37.3894)
# 4  36.0485,-93.5044      05 00068085 0400000US05    05     AR   Arkansas   00 134771517596  2960191698   Arkansas         AR             state  POINT (-93.5044 36.0485)
# 5  37.652,-120.7292      06 01779778 0400000US06    06     CA California   00 403501101370 20466718403 California         CA             state  POINT (-120.7292 37.652)
# 6  33.7942,-84.2018      13 01705317 0400000US13    13     GA    Georgia   00 149169848456  4741100880    Georgia         GA             state  POINT (-84.2018 33.7942)

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM