简体   繁体   中英

Web crawling with R questions

I'm currently using the XML package in R programming, and the POST and xpathSApply functions to do web crawling. When there are more than 2 values that satisfy the search criteria, I'd like to take just the first value.

In the image, I'd like to extract only the "짜증 나" part, located between <li> and </li> . Currently, I'm use the following command

tdReplace = xpathSApply(html, "//td[@class='tdReplace']/ul/li[2]/a", xmlValue)

without success. How should I go about fixing this?

在此处输入图片说明

Consider using rvest instead. It includes a function html_node() , which returns the first instance of the matching node.

Without seeing your HTML it is difficult to test but to parse HTML from URL my_url , something like this should work:

library(rvest)

my_url %>%
  read_html() %>%
  html_node("td.tdReplace ul li a") %>%
  html_text()

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM