簡體   English   中英

如何在R中獲取簡單的HTML表單?

[英]How can I GET a simple HTML form in R?

我有一個類似的問題。 我正在嘗試從美國人口普查地理編碼器鏈接中獲取地址的坐標(緯度和經度)。 我遵循這里提到的方法; 但是,我沒有得到所需的結果。 讓我放下3次嘗試中遵循的步驟:

嘗試#1(使用RCurl ):

url_geo <- "http://geocoding.geo.census.gov/geocoder/locations/address?form"
td.html <- getForm(url_geo,
submit = "Find",
street  = "3211 Providence Dr",
city = "Anchorage",
state   = "AK",
zip = "99508",
benchmark = "Public_AR_Current",
.opts = curlOptions(ssl.verifypeer = FALSE))

當我看到td.html的輸出時,它與執行上述網頁的“查看頁面源代碼”時所得到的相同。 實際上,td.html應該包含在上述網頁中提交表單后出現的結果頁面的詳細信息。

嘗試#2(使用httr ):

url_geo <- "http://geocoding.geo.census.gov/geocoder/locations/address?form"
fd1 <- list(
submit = "Find",
street  = "3211 Providence Dr",
city = "Anchorage",
state   = "AK",
zip = "99508",
benchmark = "Public_AR_Current"
)
resp1<-GET(url_geo, body=fd1, encode="form")
content(resp1)

resp1的內容與預期的有很大不同。

嘗試#3(使用rvest ):

url_geo <- "http://geocoding.geo.census.gov/geocoder/locations/address?form"
s <- html_session(url_geo)
f0 <- html_form(s)

在這里,我得到一個錯誤:

錯誤:當前頁面似乎不是html。

請幫助我了解我在做什么錯。 如果您需要我的任何澄清,請告訴我。

人口普查站點足夠好,可以向您發送回JSON(這是意外的,並且從進行此調用中獲得了不錯的收獲):

library(httr)
library(jsonlite)

URL <- "http://geocoding.geo.census.gov/geocoder/locations/address"

res <- GET(URL,
           query=list(street="3211 Providence Dr",
                      city="Anchorage",
                      state="AK",
                      zip="99508",
                      benchmark=4))

dat <- fromJSON(content(res, as="text"))

str(dat$result$addressMatches)
## 'data.frame': 1 obs. of  4 variables:
##  $ matchedAddress   : chr "3211 PROVIDENCE DR, ANCHORAGE, AK, 99508"
##  $ coordinates      :'data.frame':  1 obs. of  2 variables:
##   ..$ x: num -150
##   ..$ y: num 61.2
##  $ tigerLine        :'data.frame':  1 obs. of  2 variables:
##   ..$ tigerLineId: chr "638504877"
##   ..$ side       : chr "L"
##  $ addressComponents:'data.frame':  1 obs. of  12 variables:
##   ..$ fromAddress    : chr "3001"
##   ..$ toAddress      : chr "3399"
##   ..$ preQualifier   : chr ""
##   ..$ preDirection   : chr ""
##   ..$ preType        : chr ""
##   ..$ streetName     : chr "PROVIDENCE"
##   ..$ suffixType     : chr "DR"
##   ..$ suffixDirection: chr ""
##   ..$ suffixQualifier: chr ""
##   ..$ city           : chr "ANCHORAGE"
##   ..$ state          : chr "AK"
##   ..$ zip            : chr "99508"

您可以將flatten參數用於fromJSON以處理數據框架可怕的數據結構中的那些數據框架:

dat <- fromJSON(content(res, as="text"), flatten=TRUE)
dplyr::glimpse(dat$result$addressMatches)

## Observations: 1
## Variables: 17
## $ matchedAddress                    (chr) "3211 PROVIDENCE DR, ANCHORAGE, AK, 99508"
## $ coordinates.x                     (dbl) -149.8188
## $ coordinates.y                     (dbl) 61.18985
## $ tigerLine.tigerLineId             (chr) "638504877"
## $ tigerLine.side                    (chr) "L"
## $ addressComponents.fromAddress     (chr) "3001"
## $ addressComponents.toAddress       (chr) "3399"
## $ addressComponents.preQualifier    (chr) ""
## $ addressComponents.preDirection    (chr) ""
## $ addressComponents.preType         (chr) ""
## $ addressComponents.streetName      (chr) "PROVIDENCE"
## $ addressComponents.suffixType      (chr) "DR"
## $ addressComponents.suffixDirection (chr) ""
## $ addressComponents.suffixQualifier (chr) ""
## $ addressComponents.city            (chr) "ANCHORAGE"
## $ addressComponents.state           (chr) "AK"
## $ addressComponents.zip             (chr) "99508"

這將其包裝為一個函數,以便於調用:

#' Geocode address using the Census API
#'
#' @param steet Street
#' @param city City
#' @param state State
#' @param zip Zip code
#' @param benchmark "\code{current}" for this most current information,
#'        "\code{2014}" for data from the 2014 U.S. ACS survey,
#'        "\code{2010}" for data from the 2010 U.S. Census. This defaults
#'        to "\code{current}".
#' @result \code{list} of query params and response values. If successful,
#'         the geocoded values will be in \code{var$result$addressMatches}
census_geocode <- function(street, city, state, zip, benchmark="current") {

  URL <- "http://geocoding.geo.census.gov/geocoder/locations/address"

  bench <- c(`current`=4, `2014`=8, `2010`=9)[benchmark]

  res <- GET(URL,
             query=list(street=street, city=city, state=state,
                        zip=zip, benchmark=bench))

  warn_for_status(res)

  fromJSON(content(res, as="text"), flatten=TRUE)

}

census_geocode("3211 Providence Dr", "Anchorage", "AK", "99508")

構建您的URL並繞過任何形式直接提交結果URL! 例如,使用您選擇的參數,您將獲得以下URL:

urlgeo<-"http://geocoding.geo.census.gov/geocoder/locations/address?street=3211+Providence+Dr&city=Anchorage&state=AK&zip=99508&benchmark=4"

然后,您只需通過getURL檢索內容:

getURL(urlgeo)

將具有所有所需的信息。 要構建URL,只需paste其參數,並用+替換任何空格。

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM