I have a ton of google map URL's and would like to obtain a clean address from the URL's for geocoding. I recently found getURL() in the RCurl package, which gets me a ton of information
library(RCurl)
getURL(" https://maps.google.com/?q=loc%3A+%32%34%34%30+Seattle%2C+%39%38%31%31%36+WA+US ")
but all I'm really interested in is getting the address snippet located towards the front end of the getURL() output:
...< meta content=\\"loc: 2440 Seattle, 98116 WA US - Google Maps\\" property=\\"og:title\\">...
Update : I just realized the above URL address is a bad example, here's a different example:
getURL(" https://maps.google.com/?q=loc%3A+%31%30%30%35%36+Interlake+Ave+N+seattle+WA+US ")
...< meta content=\\"loc: 10056 Interlake Ave N seattle WA US - Google Maps\\" property=\\"og:title\\">...
Does anyone have suggestions on how to efficiently go about this? My appologies, I'm an intermediate with R and would appreciate your help. Thanks!!
Tim
Use the Google Maps XML-API as follows:
require(XML)
burl <- "http://maps.google.com/maps/api/geocode/xml?address="
address <- "2440 Seattle, 98116 WA US"
request <- paste0(burl,URLencode(address))
doc <- htmlTreeParse(request, useInternalNodes=TRUE)
# Interpreted Adress
xmlValue(doc[["//formatted_address"]])
[1] "2440, Seattle-Tacoma International Airport (SEA), Seattle, WA 98158, USA"
EDIT
If you only have the encoded URL use URLdecode
to decode it instead of downloading the URL:
URL <- "https://maps.google.com/?q=loc%3A+%32%34%34%30+Seattle%2C+%39%38%31%31%36+WA+US"
URL <- gsub(".*loc","",URL) # Get rid of https://...
URL <- URLdecode(URL)
gsub("[:]|[+]", " ", URL) # Get rid of ":" and "+"
[1] " 2440 Seattle, 98116 WA US"
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.