简体   繁体   English

使用R / RCurl高效下载大文件

[英]Downloading large files with R/RCurl efficiently

I see that many examples for downloading binary files with RCurl are like such: 我看到很多用RCurl下载二进制文件的例子都是这样的:

library("RCurl")
curl = getCurlHandle()
bfile=getBinaryURL (
        "http://www.example.com/bfile.zip",
        curl= curl,
        progressfunction = function(down, up) {print(down)}, noprogress = FALSE
)
writeBin(bfile, "bfile.zip")
rm(curl, bfile)

If the download is very large, I suppose it would be better writing it concurrently to the storage medium, instead of fetching all in memory. 如果下载非常大,我想最好将它同时写入存储介质,而不是在内存中获取所有内容。

In RCurl documentation there are some examples to get files by chunks and manipulate them as they are downloaded, but they seem all referred to text chunks. 在RCurl文档中,有一些示例可以通过块获取文件并在下载时对其进行操作,但它们似乎都是指文本块。

Can you give a working example? 你能给出一个有效的例子吗?

UPDATE UPDATE

A user suggests using the R native download file with mode = 'wb' option for binary files. 用户建议对于二进制文件使用带有mode = 'wb'选项的R本机download file文件。

In many cases the native function is a viable alternative, but there are a number of use-cases where this native function does not fit (https, cookies, forms etc.) and this is the reason why RCurl exists. 在许多情况下,本机函数是一个可行的替代方案,但是有许多用例不适合这种本机函数(https,cookie,表单等),这就是RCurl存在的原因。

This is the working example: 这是工作示例:

library(RCurl)
#
f = CFILE("bfile.zip", mode="wb")
curlPerform(url = "http://www.example.com/bfile.zip", writedata = f@ref)
close(f)

It will download straight to file. 它将直接下载到文件。 The returned value will be (instead of the downloaded data) the status of the request (0, if no errors occur). 返回的值将是(而不是下载的数据)请求的状态(如果没有错误,则为0)。

Mention to CFILE is a bit terse on RCurl manual. 提到CFILE对RCurl手册有点简洁。 Hopefully in the future it will include more details/examples. 希望将来它将包含更多细节/示例。

For your convenience the same code is packaged as a function (and with a progress bar): 为方便起见,相同的代码被打包为一个函数(并带有一个进度条):

bdown=function(url, file){
    library('RCurl')
    f = CFILE(file, mode="wb")
    a = curlPerform(url = url, writedata = f@ref, noprogress=FALSE)
    close(f)
    return(a)
}

## ...and now just give remote and local paths     
ret = bdown("http://www.example.com/bfile.zip", "path/to/bfile.zip")

um.. use mode = 'wb' :) ..run this and follow along w/ my comments. 嗯..使用mode ='wb':) ..运行这个并跟随w /我的评论。

# create a temporary file and a temporary directory on your local disk
tf <- tempfile()
td <- tempdir()

# run the download file function, download as binary..  save the result to the temporary file
download.file(
    "http://sourceforge.net/projects/peazip/files/4.8/peazip_portable-4.8.WINDOWS.zip/download",
    tf ,
    mode = 'wb' 
)

# unzip the files to the temporary directory
files <- unzip( tf , exdir = td )

# here are your files
files

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM