简体   繁体   中英

R: extract address using getURL()

I have a ton of google map URL's and would like to obtain a clean address from the URL's for geocoding. I recently found getURL() in the RCurl package, which gets me a ton of information

library(RCurl)

getURL(" https://maps.google.com/?q=loc%3A+%32%34%34%30+Seattle%2C+%39%38%31%31%36+WA+US ")

but all I'm really interested in is getting the address snippet located towards the front end of the getURL() output:

...< meta content=\\"loc: 2440 Seattle, 98116 WA US - Google Maps\\" property=\\"og:title\\">...

Update : I just realized the above URL address is a bad example, here's a different example:

getURL(" https://maps.google.com/?q=loc%3A+%31%30%30%35%36+Interlake+Ave+N+seattle+WA+US ")

...< meta content=\\"loc: 10056 Interlake Ave N seattle WA US - Google Maps\\" property=\\"og:title\\">...

Does anyone have suggestions on how to efficiently go about this? My appologies, I'm an intermediate with R and would appreciate your help. Thanks!!

Tim

Use the Google Maps XML-API as follows:

require(XML)

burl <- "http://maps.google.com/maps/api/geocode/xml?address="
address <- "2440 Seattle, 98116 WA US"
request <- paste0(burl,URLencode(address))

doc <- htmlTreeParse(request, useInternalNodes=TRUE)
# Interpreted Adress
xmlValue(doc[["//formatted_address"]])
[1] "2440, Seattle-Tacoma International Airport (SEA), Seattle, WA 98158, USA"

EDIT
If you only have the encoded URL use URLdecode to decode it instead of downloading the URL:

URL <- "https://maps.google.com/?q=loc%3A+%32%34%34%30+Seattle%2C+%39%38%31%31%36+WA+US"
URL <- gsub(".*loc","",URL) # Get rid of https://...
URL <- URLdecode(URL)
gsub("[:]|[+]", " ", URL) # Get rid of ":" and "+"
[1] "  2440 Seattle, 98116 WA US"

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM