简体   繁体   中英

How can I scrape table from PHP website using R?

Looking to import data into R from a table on this page:

https://legacy.baseballprospectus.com/standings/index.php?odate=2019-09-10

I've tried multiple methods using XML and httr with no luck. Have already looked at past posts including:

Read data from a php website with R

and

Scraping html tables into R data frames using the XML package

Wondering if maybe I'm not using the correct table ID from the source or if the table is not in the proper format given the tools I'm currently using?

Any and all help is much appreciated! Thanks in advance!

This won't give you exactly what you want, but it might help get you started:

library(XML)
fname <- "standings20190910.html"
download.file("https://legacy.baseballprospectus.com/standings/index.php?odate=2019-09-10", destfile=fname)
doc0 <- htmlParse(file=fname, encoding="UTF-8")
doc1 <- xmlRoot(doc0)
doc2 <- getNodeSet(doc1, "//table[@id='content']")
standings <- readHTMLTable(doc2[[1]], header=TRUE, skip.rows=1, stringsAsFactors=FALSE)

You can look at the HTML source code of the table you're trying to scrape, and then try to figure out how to create a useful R object. Look carefully at the documentation for getNodeSet and readHTMLTable in the manual of the XML package ( https://cran.r-project.org/web/packages/XML/XML.pdf ).

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM