简体   繁体   中英

Scraping HTML Table with XML in R

I am trying to scrape text values from a website. I have been able to parse the url. I am new to XPath in R. So I am not sure how to pull out all the text values that has tag as

'<p class="MsoNormal" align="justify"> text </p>.'

How do I specify the path to the the specific tag and get the text value. This is what I am trying right now.

pizzaraw<-xpathSApply(pizzadoc, "//p[@class='MsoNormal']", xmlValue)

Is this the right approach. R seems not responding to the code.

Its difficult to know what is wrong given that your example is not self-contained but here is a self-contained one that works:

Lines <- '<html>
<p class="MsoNormal" align="justify"> text </p>
</html>
'

library(XML)
root <- htmlTreeParse(Lines, asText = TRUE, useInternalNodes = TRUE)
doc <- xmlRoot(root)
xpathSApply(doc, '//p[@class="MsoNormal"]', xmlValue, trim = TRUE)
## [1] "text"

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM