简体   繁体   English

R,使用 XML 库和 htmlTreeParse 读取 html 源代码。 我是新手,所以这可能是一个简单的解决方案

[英]R, Reading html source code using XML library and htmlTreeParse. I am new to this, so it may be a simple solution

I want to be able to read in the source code to extract nodes from the HTML file.我希望能够读取源代码以从 HTML 文件中提取节点。

library(XML)
url <- ("https://www.mlb.com/marlins")
html <- htmlTreeParse(url, useInternal=T)

The issue is when I try this i get an error message saying: "XML content does not seem to be XML: '' "问题是当我尝试这个时,我收到一条错误消息:“XML 内容似乎不是 XML:''”

thanks ahead of time提前感谢

Because it is really not an XML file.因为它真的不是 XML 文件。 To read the source code, try the following script要阅读源代码,请尝试以下脚本

library(httr)
html <- httr::content(httr::GET("https://www.mlb.com/marlins"))

You can use rvest::read_html to read the source.您可以使用rvest::read_html来阅读源代码。

data <- rvest::read_html("https://www.mlb.com/marlins")

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM