[英]When scraping with rvest expected html_node not appearing
The ITTO website produces a table of timber products and flows directly under the search form once the query is submitted (on the same page). 一旦提交查询(在同一页面上),ITTO网站就会生成一张木材产品表,并直接在搜索表下方流动。 Using information I obtained from Chrome's SelectorGadget I'm expecting the table to appear as the css element "td".
使用从Chrome的SelectorGadget获得的信息,我希望表格显示为css元素“ td”。 Using rvest to scrape information on Albania for 2014...
使用rvest抓取2014年阿尔巴尼亚的信息...
library(rvest)
session <- html_session("http://www.itto.int/annual_review_output/?mode=searchdata")
form <- html_form(session)[[2]]
form <- set_values(form, "countries[]" = "8", "products[]" = "1" ,"flows[]" = "1", "years[]" = "2014")
query <- submit_form(session, form, submit = NULL)
page <- read_html(query) %>% html_nodes("td")
page
Which results in the table "td" being absent: 结果导致表“ td”不存在:
{xml_nodeset (0)}
Examining other elements of the page with html_nodes() suggests that submit_form() performed otherwise as expected. 使用html_nodes()检查页面的其他元素表明,submit_form()会按预期执行。
So my question is where is the expected table? 所以我的问题是期望表在哪里?
It might be easier (in the long run) to scrape the select box options and just feed the POST
call directly: 从长远来看,刮取选择框选项并直接提供
POST
调用可能更容易(从长远来看):
library(httr)
library(rvest)
res <- POST(url = "http://www.itto.int/annual_review_output/?mode=searchdata",
body = list(`countries[]` = "76",
`products[]` = "1", `flows[]` = "1",
`years[]` = "2014"),
encode = "form")
pg <- content(res, as="parsed")
html_nodes(pg, "td")
## {xml_nodeset (7)}
## [1] <td>Brazil</td>
## [2] <td>Ind. roundwood</td>
## [3] <td>Exports Quantity</td>
## [4] <td>1000 m3</td>
## [5] <td>2014</td>
## [6] <td style="text-align:right;">204.59</td>
## [7] <td>I</td>
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.