How can I scrape table from PHP website using R?

Question

Looking to import data into R from a table on this page:

https://legacy.baseballprospectus.com/standings/index.php?odate=2019-09-10

I've tried multiple methods using XML and httr with no luck. Have already looked at past posts including:

and

Scraping html tables into R data frames using the XML package

Wondering if maybe I'm not using the correct table ID from the source or if the table is not in the proper format given the tools I'm currently using?

Any and all help is much appreciated! Thanks in advance!

Answer 1

This won't give you exactly what you want, but it might help get you started:

library(XML)
fname <- "standings20190910.html"
download.file("https://legacy.baseballprospectus.com/standings/index.php?odate=2019-09-10", destfile=fname)
doc0 <- htmlParse(file=fname, encoding="UTF-8")
doc1 <- xmlRoot(doc0)
doc2 <- getNodeSet(doc1, "//table[@id='content']")
standings <- readHTMLTable(doc2[[1]], header=TRUE, skip.rows=1, stringsAsFactors=FALSE)

You can look at the HTML source code of the table you're trying to scrape, and then try to figure out how to create a useful R object. Look carefully at the documentation for getNodeSet and readHTMLTable in the manual of the XML package ( https://cran.r-project.org/web/packages/XML/XML.pdf ).

How can I scrape table from PHP website using R?

Question

1 answers

solution1
0 ACCPTED 2019-12-15 01:33:44

How can I scrape table from PHP website using R?

Question

1 answers

solution1 0 ACCPTED 2019-12-15 01:33:44

solution1
0 ACCPTED 2019-12-15 01:33:44