简体   繁体   中英

R, Reading html source code using XML library and htmlTreeParse. I am new to this, so it may be a simple solution

I want to be able to read in the source code to extract nodes from the HTML file.

library(XML)
url <- ("https://www.mlb.com/marlins")
html <- htmlTreeParse(url, useInternal=T)

The issue is when I try this i get an error message saying: "XML content does not seem to be XML: '' "

thanks ahead of time

Because it is really not an XML file. To read the source code, try the following script

library(httr)
html <- httr::content(httr::GET("https://www.mlb.com/marlins"))

You can use rvest::read_html to read the source.

data <- rvest::read_html("https://www.mlb.com/marlins")

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM