简体   繁体   English

从 R 中的经度和纬度获取国家和州/省名称的有效方法?

[英]Efficient way to get country & state/province names from longitude and latitude in R?

I have a huge data frame (about 1 million data points) with longitude and latitude information.我有一个包含经度和纬度信息的巨大数据框(大约 100 万个数据点)。 I would like to get country & state/province information.我想获取国家和州/省信息。 However, the code doesn't work as efficiently as I thought但是,代码并没有我想象的那么有效

Below is my code:下面是我的代码:

Sample data frame:示例数据框:

df = data.frame(
     ID =c(A00001,A00002,A00003,A00004,A00005)
     longitude = c(-98.84295,-91.11844,-75.91037,-71.00733,-92.29651)
     latitude= c(43.98332,40.17851,39.26118,46.70087,45.49510)
     )

First: read geoinformation第一:读取地理信息

library(sp)
library(rgdal)
library(dplyr)

countries_map<- readOGR(dsn="Country", layer="ne_10m_admin_0_countries")
states_map <- readOGR(dsn="States", layer="ne_10m_admin_1_states_provinces")

Then, build a function and export the result to the designated data frame然后,构建一个 function 并将结果导出到指定的数据框

geo_to_location <-function(lat,long){
  #First the coordinates are transformed to spatialpoints
  points<-SpatialPoints(data.frame(long,lat))

  #Creating a projection of the coordinates on the map of countries
  proj4string(points) <- proj4string(countries_map)
  country<-as.character(over(points, countries_map)$NAME)
  
  #The same for state/province
  proj4string(points) <- proj4string(states_map)
  state<-as.character(over(points, states_map)$name)  
  
  dplyr::bind_rows(setNames(c(country,state), c("Country", "State")))
}

df = df  %>% dplyr::bind_cols(purrr::map2_dfr(.$latitude, .$longitude, geo_to_location ))

This method works but 400,000 points already takes about 30 mins to complete.此方法有效,但完成 400,000 个点大约需要 30 分钟。 I have more than 400k points to process.我有超过 400k 点要处理。 Is there any more efficient way to handle this matter?有没有更有效的方法来处理这个问题?

Or, there's no more efficient way to process this work?或者,没有更有效的方法来处理这项工作?

Thank you all in advance.谢谢大家。

Thanks to @starja, who suggested vectorizing the function and use data.table to replace dplry.感谢@starja,他建议对 function 进行矢量化并使用 data.table 替换 dplry。

I used the first 500 rows for test and got a huge difference in the turnaround time.我使用前 500 行进行测试,并在周转时间上有了巨大的差异。

Below is the modified code:下面是修改后的代码:

geo_to_location <-function(lat,long){
  #First the coordinates are transformed to spatialpoints
  points<-SpatialPoints(data.frame(long,lat))
  #Creating a projection of the coordinates on the map of countries
  proj4string(points) <- proj4string(countries_map)
  country<-as.character(over(points, countries_map)$NAME)
  
  #The same for state
  proj4string(points) <- proj4string(states_map)
  state<-as.character(over(points, states_map)$name)  
  
  return(list(country = country, state = state )) 
}

df = as.data.table(df) 
df[, c("Country","State_Province") := geo_to_location (latitude, longitude)] 
df = as.data.frame(df)

The original method took about 3.194 mins to process 500 points.原始方法处理 500 个点大约需要 3.194 分钟。 The new method took about 0.651 secs.新方法耗时约 0.651 秒。 If there's another more efficient way to handle this matter, please let me know that I can learn a more advanced skill.如果有其他更有效的方法来处理这件事,请告诉我,我可以学习更高级的技能。

Again, thank you for the suggestion and help.再次感谢您的建议和帮助。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM