在R中下载csv文件

Question

I'm trying to download historical stock trading from my country with R. I tried with the download.file() function. 我正在尝试使用R从我的国家/地区下载历史股票交易。我尝试了download.file（）函数。 Indeed, a file is downloaded but is an empty spreadsheet. 实际上，已下载了一个文件，但它是一个空的电子表格。 Obviously, if I use this url in my browser the file I downloaded is in fact the one I want. 显然，如果我在浏览器中使用此URL，则下载的文件实际上就是我想要的文件。

I would love to do it with quantmod, but that package only applies to larger markets 我很乐意使用Quantmod来做到这一点，但该软件包仅适用于较大的市场

url<-"https://www.ccbolsa.cl/apps/script/detalleaccion/Transaccion.asp?Nemo=AFPCAPITAL&Menu=H"
destfile <- "/home/hector/TxHistoricas.xls"
download.file(url, destfile)

Thanks in advance. 提前致谢。

Answer 1

You can jury-rig something like this if you don't want to use selenium: 如果您不想使用硒，则可以评审这样的事情：

library(rvest)
library(httr)
library(stringr)

URL <- "https://www.ccbolsa.cl/apps/script/detalleaccion/Transaccion.asp?Nemo=AFPCAPITAL&Menu=H"

Get initial URL: 获取初始URL：

res <- html_session(URL, timeout(30))

It embeds a form that it uses javascript to submit to get the form: 它嵌入一个使用javascript提交以获取该表单的表单：

inputs <- html_nodes(res, "input")

It uses the last javascript entry to do a redirect on page load, so we need the location of it: 它使用最后一个javascript条目在页面加载时进行重定向，因此我们需要它的位置：

scripts <- html_nodes(res, "script")
action <- html_text(scripts[[length(scripts)]])

This is the new URL to submit to: 这是要提交到的新URL：

base_url <- "https://www.ccbolsa.cl/apps/script/detalleaccion"
loc <- str_match(action, '\\.action *= *"(.*)"')[,2]
doc_url <- sprintf("%s/%s", base_url, loc)

Gather up all the query params: 收集所有查询参数：

query <- lapply(inputs, xml_attr, "value")
names(query) <- sapply(inputs, xml_attr, "name")

Now we have to make a new POST request with the query encoded as "form", using and providing a redirect URL (timeout was necessary for me). 现在，我们必须使用并提供重定向URL（对于我而言，超时是必需的），使用编码为“ form”的查询发出一个新的POST请求。 This write the "xls" content to a file: 这将“ xls”内容写入文件：

ret <- POST(doc_url, 
            body=query, 
            encode="form",
            add_headers(Referer=URL),
            write_disk("fil.xls", overwrite=TRUE),
            timeout(30))

It says it's an XLS file: 它说这是一个XLS文件：

ret$headers$`content-type`
## [1] "application/vnd.ms-excel"

but it's really an HTML table, so you can really just do: 但这实际上是一个HTML表，因此您可以执行以下操作：

ret <- POST(doc_url, 
            body=query, 
            encode="form",
            add_headers(Referer=URL),
            timeout(30))

doc <- read_html(content(ret, as="text"))
dat <- html_table(html_nodes(doc, "table"), fill=TRUE)

to get what you're looking for (there are two ugly tables in the dat list and you may want to use header=TRUE as an additional parameter to html_table ). 来获取您想要的内容（ dat列表中有两个丑陋的表，您可能希望使用header=TRUE作为html_table的附加参数）。

I am not sure how "dynamic" this solution but that's test-able/verifiable. 我不确定该解决方案的“动态性”如何，但这是可测试/可验证的。

在R中下载csv文件

问题描述

1 个解决方案

解决方案1
0 已采纳 2015-10-12 12:05:12

在R中下载csv文件

问题描述

1 个解决方案

解决方案1 0 已采纳 2015-10-12 12:05:12

解决方案1
0 已采纳 2015-10-12 12:05:12