简体   繁体   English

getURL(来自RCurl包)在循环中不起作用

[英]getURL (from RCurl package) doesn't work in a loop

I have a list of URL named URLlist and I loop over it to get the source code for each of those URL : 我有一个名为URLlist的URL列表,我循环它以获取每个URL的源代码:

for (k in 1:length(URLlist)){
    temp = getURL(URLlist[k])
}

Problem is for some random URL, the code get stuck and I get the error message: 问题是一些随机的URL,代码卡住了,我得到错误信息:

Error in function (type, msg, asError = TRUE)  : 
    transfer closed with outstanding read data remaining

But when I try the getURL function, not in the loop, with the URL which had a problem, it perfectly works. 但是当我尝试getURL函数时,不是在循环中,使用有问题的URL,它完全有效。

Any help please ? 有什么帮助吗? thank you very much 非常感谢你

Hard to tell for sure without more information, but it could just be the requests getting sent too quickly, in which case just pausing between requests could help : 很难在没有更多信息的情况下确定,但它可能只是请求发送得太快,在这种情况下只是在请求之间暂停可能会有所帮助:

for (k in 1:length (URLlist)) {
    temp = getURL (URLlist[k])
    Sys.sleep (0.2) 
}

I'm assuming that your actual code does something with 'temp' before writing over it in every iteration of the loop, and whatever it does is very fast. 我假设你的实际代码在循环的每次迭代中写入之前都使用'temp'做一些事情,无论它做什么都非常快。

You could also try building in some error handling so that one problem doesn't kill the whole thing. 您也可以尝试构建一些错误处理,以便一个问题不会杀死整个问题。 Here's a crude example that tries twice on each URL before giving up: 这是一个粗略的例子,在放弃之前对每个URL尝试两次:

for (url in URLlist) {
    temp = try (getURL (url))
    if (class (temp) == "try-error") {
        temp = try (getURL (url))
        if (class (temp) == "try-error")
            temp = paste ("error accessing", url)
        }    
    Sys.sleep(0.2) 
}

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM