简体繁体 English

R，使用 XML 库和 htmlTreeParse 读取 html 源代码。我是新手，所以这可能是一个简单的解决方案

[英]R, Reading html source code using XML library and htmlTreeParse. I am new to this, so it may be a simple solution

原文 2021-04-29 22:53:06 1 2 html/ r/ xml

I want to be able to read in the source code to extract nodes from the HTML file.我希望能够读取源代码以从 HTML 文件中提取节点。

library(XML)
url <- ("https://www.mlb.com/marlins")
html <- htmlTreeParse(url, useInternal=T)

The issue is when I try this i get an error message saying: "XML content does not seem to be XML: '' "问题是当我尝试这个时，我收到一条错误消息：“XML 内容似乎不是 XML：''”

thanks ahead of time提前感谢

2 个解决方案

Because it is really not an XML file.因为它真的不是 XML 文件。 To read the source code, try the following script要阅读源代码，请尝试以下脚本

library(httr)
html <- httr::content(httr::GET("https://www.mlb.com/marlins"))

You can use rvest::read_html to read the source.您可以使用rvest::read_html来阅读源代码。

data <- rvest::read_html("https://www.mlb.com/marlins")

从html源将XML数据读入R - Reading XML data into R from a html source

Web抓取：Chrome开发人员工具可看到html结构，但htmlTreeParse（R）无法看到 - Web scraping: html structure visible with chrome developer tool, but not with htmlTreeParse (R)

如何使用R从网站源代码/ html抓取信息？ - How do I scrape information from website source code/html using R?

需要更改HTML结构，以便它可以与我正在使用的脚本一起使用 - Need to change HTML structure so it will work with a script I am using

我的html代码遇到异常行为。可能是什么问题？ - i am experiencing an unusual behaviour in my html code. what may be the problem?

如何使用带有Emacs的HTML显示XML源代码？ - How to display XML source code using HTML with Emacs?

我使用的 HTML 代码无法正确显示？ - HTML code I am using will not display properly?

使用Perl读取Web 2.0 HTML源代码 - Reading Web 2.0 HTML Source Code with Perl

将HTML / XML PDF文件格式读入R中 - Reading in HTML/XML PDF file formats into R

如何从代码后面获取html页面的渲染源代码，以便我可以通过邮件发送它 - How to get rendered source code of an html page from code behind so I can send it in mail

暂无

暂无

声明:本站的技术帖子网页，遵循CC BY-SA 4.0协议，如果您需要转载，请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 从html源将XML数据读入R - Reading XML data into R from a html source Web抓取：Chrome开发人员工具可看到html结构，但htmlTreeParse（R）无法看到 - Web scraping: html structure visible with chrome developer tool, but not with htmlTreeParse (R) 如何使用R从网站源代码/ html抓取信息？ - How do I scrape information from website source code/html using R? 需要更改HTML结构，以便它可以与我正在使用的脚本一起使用 - Need to change HTML structure so it will work with a script I am using 我的html代码遇到异常行为。可能是什么问题？ - i am experiencing an unusual behaviour in my html code. what may be the problem? 如何使用带有Emacs的HTML显示XML源代码？ - How to display XML source code using HTML with Emacs? 我使用的 HTML 代码无法正确显示？ - HTML code I am using will not display properly? 使用Perl读取Web 2.0 HTML源代码 - Reading Web 2.0 HTML Source Code with Perl 将HTML / XML PDF文件格式读入R中 - Reading in HTML/XML PDF file formats into R 如何从代码后面获取html页面的渲染源代码，以便我可以通过邮件发送它 - How to get rendered source code of an html page from code behind so I can send it in mail

相关标签

粤ICP备18138465号 © 2020-2024 STACKOOM.COM