简体   繁体   English

将url表转换为`data.frame` R-XML-RCurl

[英]get url table into a `data.frame` R-XML-RCurl

I'm trying to get the table of an url into a data.frame . 我正在尝试将url表放入data.frame In other examples I found the following code worked: 在其他示例中,我发现以下代码有效:

library(XML)
library(RCurl)
theurl <- "https://es.finance.yahoo.com/q/cp?s=BEL20.BR"
tables <- readHTMLTable(theurl)

As the warning says the table doesn't seem to be XML 正如警告所说,该表似乎不是XML

Warning message: XML content does not seem to be XML: 'https://es.finance.yahoo.com/q/cp?s=BEL20.BR'

Alternatively, getURLContent(theurl, ssl.verifypeer = FALSE, useragent = "R") works but don't know how to extract the table. 或者, getURLContent(theurl, ssl.verifypeer = FALSE, useragent = "R")有效,但不知道如何提取表。 Any help would be appreciated. 任何帮助,将不胜感激。

EDIT: thanks to @har07 using table <- readHTMLTable(getURLContent(theurl, ssl.verifypeer = FALSE, useragent = "R"))$ yfncsumtab gives the output but still have to be filtered. 编辑:感谢@ har07使用table <- readHTMLTable(getURLContent(theurl, ssl.verifypeer = FALSE, useragent = "R"))$ yfncsumtab给出输出但仍需要过滤。

You can get the table if you use getURL to get the document content. 如果使用getURL获取文档内容,则可以获取该表。 Sometimes readHTMLTable has trouble getting content. 有时readHTMLTable无法获取内容。 In those cases, it is recommended to try getURL 在这些情况下,建议尝试getURL

> library(XML)
> library(RCurl)
> URL <- getURL("https://es.finance.yahoo.com/q/cp?s=BEL20.BR")
> rt <- readHTMLTable(URL, header = TRUE)
> rt

You might need to adjust the header argument and possibly others, but the tables are there. 您可能需要调整header参数以及可能的其他参数,但表格就在那里。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM