简体   繁体   中英

R Scraping html webpage using XML

I am trying to scrape this webpage using the following code.

library(XML)
url <- html("http://www.gallop.co.za/")
doc <- htmlParse(url)
lat <- xpathSApply(doc,path="//p[@id=Racecards]",fun = xmlGetAttr , name = 'Racecards')

I looked at the webpage and the table i want to scrape is the racecard table, primarily to get the links to where the racecard data is.

I used selector gadget which returns the xml path as:

//*[(@id = "Racecards")]

However, when i use the R code, it returns a zero list. It feels like i'm getting the xml path wrong somehow, what is the correct way to return the table but also return the links within the table?

It seems that the data are transported through json and use js to insert into html. So you can't get the data from html . You can get it directly from json .

library(RCurl)
library(jsonlite)

p <- getURL("http://www.gallop.co.za/cache/horses.json")
fromJSON(p)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM