[英]how to download a large binary file with RCurl *after* server authentication
我最初問這個關於用httr
包執行這個任務的問題 ,但我認為不可能使用httr
。 所以我重新編寫了我的代碼來使用RCurl
- 但我仍然在絆倒可能與writefunction
相關的writefunction
......但我真的不明白為什么。
您應該能夠使用32位版本的R來重現我的工作,因此如果您在RAM中讀取任何內容,則會達到內存限制。 我需要一個直接下載到硬盤的解決方案。
首先,這段代碼可以正常工作 - 壓縮文件被妥善保存到磁盤上。
library(RCurl)
filename <- tempfile()
f <- CFILE(filename, "wb")
url <- "http://www2.census.gov/acs2011_5yr/pums/csv_pus.zip"
curlPerform(url = url, writedata = f@ref)
close(f)
# 2.1 GB file successfully written to disk
現在這里是一些RCurl
代碼。 如前一個問題所述 ,復制這一點將需要在ipums上創建一個提取。
your.email <- "email@address.com"
your.password <- "password"
extract.path <- "https://usa.ipums.org/usa-action/downloads/extract_files/some_file.csv.gz"
library(RCurl)
values <-
list(
"login[email]" = your.email ,
"login[password]" = your.password ,
"login[is_for_login]" = 1
)
curl = getCurlHandle()
curlSetOpt(
cookiejar = 'cookies.txt',
followlocation = TRUE,
autoreferer = TRUE,
ssl.verifypeer = FALSE,
curl = curl
)
params <-
list(
"login[email]" = your.email ,
"login[password]" = your.password ,
"login[is_for_login]" = 1
)
html <- postForm("https://usa.ipums.org/usa-action/users/validate_login", .params = params, curl = curl)
dl <- getURL( "https://usa.ipums.org/usa-action/extract_requests/download" , curl = curl)
現在我已登錄,嘗試與上面相同的命令,但使用curl
對象來保留cookie。
filename <- tempfile()
f <- CFILE(filename, mode = "wb")
這條線斷裂 -
curlPerform(url = extract.path, writedata = f@ref, curl = curl)
close(f)
# the error is:
Error in curlPerform(url = extract.path, writedata = f@ref, curl = curl) :
embedded nul in string: [[binary jibberish here]]
我上一篇文章的答案提到了這個c級寫功能的答案,但我對如何重新創建curl_writer C程序(在Windows上?)一無所知。
dyn.load("curl_writer.so")
writer <- getNativeSymbolInfo("writer", PACKAGE="curl_writer")$address
curlPerform(URL=url, writefunction=writer)
..或者為什么它甚至是必要的,因為這個問題頂部的五行代碼沒有像getNativeSymbolInfo
那樣瘋狂。 我只是不明白為什么傳遞存儲身份驗證/ cookie的額外curl
對象並告訴它不要驗證SSL會導致代碼無法正常工作..打破?
從此鏈接創建一個名為curl_writer.c
的文件,並將其保存到C:\\<folder where you save your R files>
#include <stdio.h> /** * Original code just sent some message to stderr */ size_t writer(void *buffer, size_t size, size_t nmemb, void *stream) { fwrite(buffer,size,nmemb,(FILE *)stream); return size * nmemb; }
打開命令窗口,轉到保存curl_writer.c
的文件夾並運行R編譯器
c:> cd "C:\\<folder where you save your R files>" c:> R CMD SHLIB -o curl_writer.dll curl_writer.c
打開R並運行腳本
C:> R your.email <- "email@address.com" your.password <- "password" extract.path <- "https://usa.ipums.org/usa-action/downloads/extract_files/some_file.csv.gz" library(RCurl) values <- list( "login[email]" = your.email , "login[password]" = your.password , "login[is_for_login]" = 1 ) curl = getCurlHandle() curlSetOpt( cookiejar = 'cookies.txt', followlocation = TRUE, autoreferer = TRUE, ssl.verifypeer = FALSE, curl = curl ) params <- list( "login[email]" = your.email , "login[password]" = your.password , "login[is_for_login]" = 1 ) html <- postForm("https://usa.ipums.org/usa-action/users/validate_login", .params = params, curl = curl) dl <- getURL( "https://usa.ipums.org/usa-action/extract_requests/download" , curl = curl) # Load the DLL you created # "writer" is the name of the function # "curl_writer" is the name of the dll dyn.load("curl_writer.dll") writer <- getNativeSymbolInfo("writer", PACKAGE="curl_writer")$address # Note that "URL" parameter is upper case, in your code it is lowercase # I'm not sure if that has something to do # "writer" is the symbol defined above f <- CFILE(filename <- tempfile(), "wb") curlPerform(URL=url, writedata=f@ref, writefunction=writer, curl=curl) close(f)
現在可以使用httr
包。 謝謝哈德利!
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.