简体   繁体   中英

How can I GET a simple HTML form in R?

I have a similar question. I am trying to fetch coordinates (latitude and longitude) for an address from US Census geocoder link. I have followed the approach mentioned here ; however, I am not getting the required result. Let me put down the steps that I have followed during 3 attempts:

Attempt #1 (using RCurl ):

url_geo <- "http://geocoding.geo.census.gov/geocoder/locations/address?form"
td.html <- getForm(url_geo,
submit = "Find",
street  = "3211 Providence Dr",
city = "Anchorage",
state   = "AK",
zip = "99508",
benchmark = "Public_AR_Current",
.opts = curlOptions(ssl.verifypeer = FALSE))

When I see the output of td.html , it is same as what you get when you do "View Page Source" of above webpage. Actually, td.html should instead contain the details of resulting page that appear after submitting form in above webpage.

Attempt #2 (Using httr ):

url_geo <- "http://geocoding.geo.census.gov/geocoder/locations/address?form"
fd1 <- list(
submit = "Find",
street  = "3211 Providence Dr",
city = "Anchorage",
state   = "AK",
zip = "99508",
benchmark = "Public_AR_Current"
)
resp1<-GET(url_geo, body=fd1, encode="form")
content(resp1)

The content of resp1 is very different from what one would expect.

Attempt #3 (Using rvest ):

url_geo <- "http://geocoding.geo.census.gov/geocoder/locations/address?form"
s <- html_session(url_geo)
f0 <- html_form(s)

Here, I get an error:

Error: Current page doesn't appear to be html.

Please help me understand what I am doing wrong. If you need any clarification from me, please let me know.

The Census site is being nice enough to send you back JSON (that was unexpected and a nice bonus from doing this call):

library(httr)
library(jsonlite)

URL <- "http://geocoding.geo.census.gov/geocoder/locations/address"

res <- GET(URL,
           query=list(street="3211 Providence Dr",
                      city="Anchorage",
                      state="AK",
                      zip="99508",
                      benchmark=4))

dat <- fromJSON(content(res, as="text"))

str(dat$result$addressMatches)
## 'data.frame': 1 obs. of  4 variables:
##  $ matchedAddress   : chr "3211 PROVIDENCE DR, ANCHORAGE, AK, 99508"
##  $ coordinates      :'data.frame':  1 obs. of  2 variables:
##   ..$ x: num -150
##   ..$ y: num 61.2
##  $ tigerLine        :'data.frame':  1 obs. of  2 variables:
##   ..$ tigerLineId: chr "638504877"
##   ..$ side       : chr "L"
##  $ addressComponents:'data.frame':  1 obs. of  12 variables:
##   ..$ fromAddress    : chr "3001"
##   ..$ toAddress      : chr "3399"
##   ..$ preQualifier   : chr ""
##   ..$ preDirection   : chr ""
##   ..$ preType        : chr ""
##   ..$ streetName     : chr "PROVIDENCE"
##   ..$ suffixType     : chr "DR"
##   ..$ suffixDirection: chr ""
##   ..$ suffixQualifier: chr ""
##   ..$ city           : chr "ANCHORAGE"
##   ..$ state          : chr "AK"
##   ..$ zip            : chr "99508"

You can use the flatten parameter to fromJSON to deal with those data frames within a data frame horrible data structure:

dat <- fromJSON(content(res, as="text"), flatten=TRUE)
dplyr::glimpse(dat$result$addressMatches)

## Observations: 1
## Variables: 17
## $ matchedAddress                    (chr) "3211 PROVIDENCE DR, ANCHORAGE, AK, 99508"
## $ coordinates.x                     (dbl) -149.8188
## $ coordinates.y                     (dbl) 61.18985
## $ tigerLine.tigerLineId             (chr) "638504877"
## $ tigerLine.side                    (chr) "L"
## $ addressComponents.fromAddress     (chr) "3001"
## $ addressComponents.toAddress       (chr) "3399"
## $ addressComponents.preQualifier    (chr) ""
## $ addressComponents.preDirection    (chr) ""
## $ addressComponents.preType         (chr) ""
## $ addressComponents.streetName      (chr) "PROVIDENCE"
## $ addressComponents.suffixType      (chr) "DR"
## $ addressComponents.suffixDirection (chr) ""
## $ addressComponents.suffixQualifier (chr) ""
## $ addressComponents.city            (chr) "ANCHORAGE"
## $ addressComponents.state           (chr) "AK"
## $ addressComponents.zip             (chr) "99508"

This wraps it into a function for easier calling:

#' Geocode address using the Census API
#'
#' @param steet Street
#' @param city City
#' @param state State
#' @param zip Zip code
#' @param benchmark "\code{current}" for this most current information,
#'        "\code{2014}" for data from the 2014 U.S. ACS survey,
#'        "\code{2010}" for data from the 2010 U.S. Census. This defaults
#'        to "\code{current}".
#' @result \code{list} of query params and response values. If successful,
#'         the geocoded values will be in \code{var$result$addressMatches}
census_geocode <- function(street, city, state, zip, benchmark="current") {

  URL <- "http://geocoding.geo.census.gov/geocoder/locations/address"

  bench <- c(`current`=4, `2014`=8, `2010`=9)[benchmark]

  res <- GET(URL,
             query=list(street=street, city=city, state=state,
                        zip=zip, benchmark=bench))

  warn_for_status(res)

  fromJSON(content(res, as="text"), flatten=TRUE)

}

census_geocode("3211 Providence Dr", "Anchorage", "AK", "99508")

Build your URL and submit the resulting URL directly, bypassing any form! For instance, with the parameters you selected, you obtain the following URL:

urlgeo<-"http://geocoding.geo.census.gov/geocoder/locations/address?street=3211+Providence+Dr&city=Anchorage&state=AK&zip=99508&benchmark=4"

Then, you can simply retrieve the content through getURL :

getURL(urlgeo)

will have all the needed info. To build the URL, just paste its arguments, replacing any blank space with a + .

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM