R Scraping html webpage using XML

Question

I am trying to scrape this webpage using the following code.

library(XML)
url <- html("http://www.gallop.co.za/")
doc <- htmlParse(url)
lat <- xpathSApply(doc,path="//p[@id=Racecards]",fun = xmlGetAttr , name = 'Racecards')

I looked at the webpage and the table i want to scrape is the racecard table, primarily to get the links to where the racecard data is.

I used selector gadget which returns the xml path as:

//*[(@id = "Racecards")]

However, when i use the R code, it returns a zero list. It feels like i'm getting the xml path wrong somehow, what is the correct way to return the table but also return the links within the table?

Answer 1

It seems that the data are transported through json and use js to insert into html. So you can't get the data from html . You can get it directly from json .

library(RCurl)
library(jsonlite)

p <- getURL("http://www.gallop.co.za/cache/horses.json")
fromJSON(p)

R Scraping html webpage using XML

Question

1 answers

solution1
1 2017-03-21 10:34:33

R Scraping html webpage using XML

Question

1 answers

solution1 1 2017-03-21 10:34:33

solution1
1 2017-03-21 10:34:33