[英]Geocoding with R: Errors stopping program altogether
我有一個工作程序,它從 Excel 的列表中提取地址,並使用 Google API 對其進行地理編碼,但只要它到達一個帶有公寓、單元或無法找到地址的地址,它就會停止程序。 我無法在循環中獲得可行的 tryCatch 例程。 :(
這是代碼:
library("readxl")
library(ggplot2)
library(ggmap)
fileToLoad <- file.choose(new = TRUE)
origAddress <- read_excel(fileToLoad, sheet = "Sheet1")
geocoded <- data.frame(stringsAsFactors = FALSE)
for(i in 1:nrow(origAddress))
{
# Print("Working...")
result <- geocode(origAddress$addresses[i], output = "latlona", source = "google")
origAddress$lon[i] <- as.numeric(result[1])
origAddress$lat[i] <- as.numeric(result[2])
origAddress$geoAddress[i] <- as.character(result[3])
}
write.csv(origAddress, "geocoded1.csv", row.names=FALSE)
這是錯誤消息:
Warning: Geocoding "[removed address]" failed with error:
You must use an API key to authenticate each request to Google Maps Platform APIs. For additional information, please refer to http://g.co/dev/maps-no-account
Error: Can't subset columns that don't exist.
x Location 3 doesn't exist.
i There are only 2 columns.
Run `rlang::last_error()` to see where the error occurred.
In addition: Warning messages:
1: Unknown or uninitialised column: `lon`.
2: Unknown or uninitialised column: `lat`.
3: Unknown or uninitialised column: `geoAddress`.
現在,這不是 API 密鑰錯誤,因為該密鑰在錯誤之后的調用中起作用——並且它停止在街道名稱后以數字結尾的任何地址。
我每個月要處理成批的數千個地址,它們並不都是完美的,所以我需要能夠跳過這些壞地址,在 lon/lat 列中輸入“NA”,然后繼續前行。
我是 R 的新手,無法制定可行的錯誤處理例程來處理這些類型的錯誤。 誰能指出我正確的方向? 提前致謝。
當 geocode 找不到地址並且output = "latlona"
時,不會返回address
字段。 您的代碼可以使用以下修改。
#
# example data
#
origAddress <- data.frame(addresses = c("white house, Washington",
"white house, # 100, Washington",
"white hose, Washington",
"Washington Apartments, Washington, DC 20001",
"1278 7th st nw, washington, dc 20001") )
#
# simple fix for fatal error
#
for(i in 1:nrow(origAddress))
{
result <- geocode(origAddress$addresses[i], output = "latlona",
source = "google")
origAddress$lon[i] <- result$lon[1]
origAddress$lat[i] <- result$lat[1]
origAddress$geoAddress[i] <- ifelse( is.na(result$lon[1]), NA, result$address[1] )
}
但是,您提到您的某些地址可能不准確。 Google 的地理編碼將嘗試解釋您提供的所有地址。 有時它會失敗並返回 NA,但有時它的解釋可能不正確,因此您應該始終檢查地理編碼結果。 一種簡單的方法,可以捕獲許多錯誤,在geocode
中設置output = "more"
,然后檢查loctype
列中返回的值。 如果loctype != "rooftop"
,您可能會遇到問題。 檢查type
列將為您提供更多信息。 此項檢查未完成。 要進行更完整的檢查,您可以使用output = "all"
返回 google 為地址提供的所有數據,但這需要解析一個中等復雜的列表。 您應該在https://developers.google.com/maps/documentation/geocoding/overview閱讀有關 google 地理編碼返回的數據的更多信息
此外, geocode
至少需要數十分鍾才能返回數千個地址的結果。 為了最大限度地縮短響應時間,您應該將地址作為地址的字符向量提供給地理編碼。 然后返回結果數據框,您可以使用它來更新origAddress
數據框並檢查錯誤,如下所示。
#
# Solution should check for wrongly interpreted addresses
#
# see https://developers.google.com/maps/documentation/geocoding/overview
# for more information on fields returned by google geocoding
#
# return all addresses in single call to geocode
#
origAddress <- data.frame(addresses = c("white house, Washington", # identified by name
"white hose, Washington", # misspelling
"Washington Apartments, apt 100, Washington, DC 20001", # identified by name of apartment building
"Washington Apartments, # 100, Washington, DC 20001", # invalid apartment number specification
"1206 7th st nw, washington, dc 20001") ) # address on street but no structure with that address
result <- suppressWarnings(geocode(location = origAddress$addresses,
output = "more",
source = "google") )
origAddress <- cbind(origAddress, result[, c("address", "lon","lat","type", "loctype")])
#
# Addresses which need to be checked
#
check_addresses <- origAddress[ origAddress$loctype != "rooftop" |
is.na(origAddress$loctype), ]
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.