简体   繁体   English

使用RVest进行搜寻时,预期的html_node不出现

[英]When scraping with rvest expected html_node not appearing

The ITTO website produces a table of timber products and flows directly under the search form once the query is submitted (on the same page). 一旦提交查询(在同一页面上),ITTO网站就会生成一张木材产品表,并直接在搜索表下方流动。 Using information I obtained from Chrome's SelectorGadget I'm expecting the table to appear as the css element "td". 使用从Chrome的SelectorGadget获得的信息,我希望表格显示为css元素“ td”。 Using rvest to scrape information on Albania for 2014... 使用rvest抓取2014年阿尔巴尼亚的信息...

library(rvest)

session <- html_session("http://www.itto.int/annual_review_output/?mode=searchdata")
form <- html_form(session)[[2]]
form <- set_values(form, "countries[]" = "8", "products[]" = "1" ,"flows[]" = "1", "years[]" = "2014")
query <- submit_form(session, form, submit = NULL)
page <- read_html(query) %>% html_nodes("td")
page 

Which results in the table "td" being absent: 结果导致表“ td”不存在:

{xml_nodeset (0)}

Examining other elements of the page with html_nodes() suggests that submit_form() performed otherwise as expected. 使用html_nodes()检查页面的其他元素表明,submit_form()会按预期执行。

So my question is where is the expected table? 所以我的问题是期望表在哪里?

It might be easier (in the long run) to scrape the select box options and just feed the POST call directly: 从长远来看,刮取选择框选项并直接提供POST调用可能更容易(从长远来看):

library(httr)
library(rvest)

res <- POST(url = "http://www.itto.int/annual_review_output/?mode=searchdata",
            body = list(`countries[]` = "76", 
                        `products[]` = "1", `flows[]` = "1", 
                        `years[]` = "2014"), 
            encode = "form")

pg <- content(res, as="parsed")
html_nodes(pg, "td")

## {xml_nodeset (7)}
## [1] <td>Brazil</td>
## [2] <td>Ind. roundwood</td>
## [3] <td>Exports Quantity</td>
## [4] <td>1000 m3</td>
## [5] <td>2014</td>
## [6] <td style="text-align:right;">204.59</td>
## [7] <td>I</td>

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM