[英]download csv file in R
I'm trying to download historical stock trading from my country with R. I tried with the download.file() function. 我正在尝试使用R从我的国家/地区下载历史股票交易。我尝试了download.file()函数。 Indeed, a file is downloaded but is an empty spreadsheet. 实际上,已下载了一个文件,但它是一个空的电子表格。 Obviously, if I use this url in my browser the file I downloaded is in fact the one I want. 显然,如果我在浏览器中使用此URL,则下载的文件实际上就是我想要的文件。
I would love to do it with quantmod, but that package only applies to larger markets 我很乐意使用Quantmod来做到这一点,但该软件包仅适用于较大的市场
url<-"https://www.ccbolsa.cl/apps/script/detalleaccion/Transaccion.asp?Nemo=AFPCAPITAL&Menu=H"
destfile <- "/home/hector/TxHistoricas.xls"
download.file(url, destfile)
Thanks in advance. 提前致谢。
You can jury-rig something like this if you don't want to use selenium: 如果您不想使用硒,则可以评审这样的事情:
library(rvest)
library(httr)
library(stringr)
URL <- "https://www.ccbolsa.cl/apps/script/detalleaccion/Transaccion.asp?Nemo=AFPCAPITAL&Menu=H"
Get initial URL: 获取初始URL:
res <- html_session(URL, timeout(30))
It embeds a form that it uses javascript to submit to get the form: 它嵌入一个使用javascript提交以获取该表单的表单:
inputs <- html_nodes(res, "input")
It uses the last javascript entry to do a redirect on page load, so we need the location of it: 它使用最后一个javascript条目在页面加载时进行重定向,因此我们需要它的位置:
scripts <- html_nodes(res, "script")
action <- html_text(scripts[[length(scripts)]])
This is the new URL to submit to: 这是要提交到的新URL:
base_url <- "https://www.ccbolsa.cl/apps/script/detalleaccion"
loc <- str_match(action, '\\.action *= *"(.*)"')[,2]
doc_url <- sprintf("%s/%s", base_url, loc)
Gather up all the query params: 收集所有查询参数:
query <- lapply(inputs, xml_attr, "value")
names(query) <- sapply(inputs, xml_attr, "name")
Now we have to make a new POST
request with the query encoded as "form", using and providing a redirect URL (timeout was necessary for me). 现在,我们必须使用并提供重定向URL(对于我而言,超时是必需的),使用编码为“ form”的查询发出一个新的POST
请求。 This write the "xls" content to a file: 这将“ xls”内容写入文件:
ret <- POST(doc_url,
body=query,
encode="form",
add_headers(Referer=URL),
write_disk("fil.xls", overwrite=TRUE),
timeout(30))
It says it's an XLS file: 它说这是一个XLS文件:
ret$headers$`content-type`
## [1] "application/vnd.ms-excel"
but it's really an HTML table, so you can really just do: 但这实际上是一个HTML表,因此您可以执行以下操作:
ret <- POST(doc_url,
body=query,
encode="form",
add_headers(Referer=URL),
timeout(30))
doc <- read_html(content(ret, as="text"))
dat <- html_table(html_nodes(doc, "table"), fill=TRUE)
to get what you're looking for (there are two ugly tables in the dat
list and you may want to use header=TRUE
as an additional parameter to html_table
). 来获取您想要的内容( dat
列表中有两个丑陋的表,您可能希望使用header=TRUE
作为html_table
的附加参数)。
I am not sure how "dynamic" this solution but that's test-able/verifiable. 我不确定该解决方案的“动态性”如何,但这是可测试/可验证的。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.