Web crawling with R questions

Question

I'm currently using the XML package in R programming, and the POST and xpathSApply functions to do web crawling. When there are more than 2 values that satisfy the search criteria, I'd like to take just the first value.

In the image, I'd like to extract only the "짜증 나" part, located between <li> and </li> . Currently, I'm use the following command

tdReplace = xpathSApply(html, "//td[@class='tdReplace']/ul/li[2]/a", xmlValue)

without success. How should I go about fixing this?

Answer 1

Consider using rvest instead. It includes a function html_node() , which returns the first instance of the matching node.

Without seeing your HTML it is difficult to test but to parse HTML from URL my_url , something like this should work:

library(rvest)

my_url %>%
  read_html() %>%
  html_node("td.tdReplace ul li a") %>%
  html_text()

Web crawling with R questions

Question

1 answers

solution1
2 2018-09-20 06:19:15

Web crawling with R questions

Question

1 answers

solution1 2 2018-09-20 06:19:15

solution1
2 2018-09-20 06:19:15