简体   繁体   中英

R - Extracting Tables From Websites Using XML Package

I am trying to replicate the method used in a previous answer here Scraping html tables into R data frames using the XML package for my own work but cannot get the data to extract. The website I am using is: http://www.footballfanalytics.com/articles/football/euro_super_league_table.html

I just wish to extract a table of each team name and their current rating score. My code is as follows:

library(XML)
theurl <-  "http://www.footballfanalytics.com/articles/football/euro_super_league_table.html"
tables <- readHTMLTable(theurl)
n.rows <- unlist(lapply(tables, function(t) dim(t)[1]))
tables[[which.max(n.rows)]]

This produces the error message

Error in tables[[which.max(n.rows)]] : 
attempt to select less than one element

Could anyone suggest a solution please? Is there something in this particular site causing this not to work? Or is there a better alternative method I can try? Thanks

Seems as if the data is loaded via javascript. Try:

library(XML)
theurl <- "http://www.footballfanalytics.com/xml/esl/esl.xml"
doc <- xmlParse(theurl)
cbind(team = xpathSApply(doc, "/StatsData/Teams/Team/Name", xmlValue),
      points = xpathSApply(doc, "/StatsData/Teams/Team/Points", xmlValue))

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM