简体   繁体   English

错误处理:Boto:[错误 104] 对等方的连接重置

[英]Error Handling: Boto: [Error 104] Connection Reset by Peer

I have a script that downloads from Amazon S3.我有一个从 Amazon S3 下载的脚本。 The scripts works 99.9% of the time.这些脚本在 99.9% 的时间里都能正常工作。 Occasionally I get the following error (socket.error: [Errno 104] Connection reset by peer).有时我会收到以下错误(socket.error: [Errno 104] Connection reset by peer)。 Once I restart the code the error seems to go away.一旦我重新启动代码,错误似乎 go 消失了。 Since its hard to recreate the error.由于很难重新创建错误。 I'm hoping the snipped of code below will fix the error.我希望下面的代码片段可以修复错误。 Specifically, I'm hoping if the error comes up, it'll try to re-download the file.具体来说,我希望如果出现错误,它会尝试重新下载文件。 I'm wondering if this code will work, and if there is anything else I should add in. I'm thinking an error counter might be good, so if the error does keep coming up it'll eventually move on.我想知道这段代码是否可以工作,如果还有什么我应该添加的。我认为错误计数器可能很好,所以如果错误继续出现,它最终会继续前进。 (Not exactly sure how to add a counter) (不完全确定如何添加计数器)

files = [#list of files to download]

for file in files:
    for keys in bucket.list(prefix=file):
        while True:
            try:
                keys.get_contents_to_filename()
            except socket.error:
                continue
            break

I had exactly the same problem.我遇到了完全相同的问题。 If you search boto on GitHub, you will see, we are not alone.如果你在 GitHub 上搜索 boto,你会发现,我们并不孤单。

There's also a known accepted issue: https://github.com/boto/boto/issues/2207还有一个已知的接受问题: https : //github.com/boto/boto/issues/2207

Reaching performance limits of AWS S3达到 AWS S3 的性能限制

The truth is, that we got so used to boto and AWS S3 service, we have forgotten, these are really distributed systems, which might break in some cases.事实是,我们已经习惯了 boto 和 AWS S3 服务,我们忘记了,这些是真正的分布式系统,在某些情况下可能会崩溃。

I was archiving (download, tar, upload) huge number of files (about 3 years with around 15 feeds each having about 1440 versions a day) and using Celery to do this faster.我正在归档(下载、tar、上传)大量文件(大约 3 年,大约 15 个提要,每个提要每天有大约 1440 个版本)并使用 Celery 更快地完成这项工作。 And I have to say, that I was sometime getting these errors more often, probably reaching performance limits of AWS S3.我不得不说,我有时更频繁地收到这些错误,可能达到了 AWS S3 的性能限制。 These errors were often appearing in chunks (in my case I was uploading about 60 Mbps for couple of hours).这些错误经常成块出现(在我的例子中,我上传了大约 60 Mbps 的数据,持续了几个小时)。

Training S3 performance训练 S3 性能

When I was measuring performance, it was "trained".当我测量性能时,它是“受过训练的”。 After some hour, the responsiveness of S3 bucket jumped up, AWS have probably detected higher load and spin up some more instances serving it.几个小时后,S3 存储桶的响应速度加快了,AWS 可能检测到更高的负载并启动了更多的实例来为它服务。

Try latest stable version of boto尝试最新的稳定版boto

Other thing is, that boto is trying to retry in many cases, so many failures are hidden to our calls.另一件事是, boto在许多情况下都试图重试,因此我们的调用隐藏了许多失败。 Sometime I got a bit better with upgrading to the latest stable version.有时我升级到最新的稳定版本会好一些。

My conclusion are:我的结论是:

  • try upgrading to the latest stable boto尝试升级到最新的稳定boto
  • when error rate grows up, lower the pressure当错误率上升时,降低压力
  • accept the fact, that AWS S3 is distributed service having rare performance problems接受事实,即 AWS S3 是具有罕见性能问题的分布式服务

In your code, I would definitely recommend adding some sleep, (at least 5, but 30 s would seem fine to me), otherwise you are just pushing harder and harder to a system, which might be in shaky situation at the moment.在您的代码中,我绝对建议您添加一些睡眠(至少 5 秒,但对我来说 30 秒似乎没问题),否则您只会越来越努力地推动系统,目前系统可能处于不稳定的状态。

Well, it appeared the time.sleep() worked for a while.好吧,看来 time.sleep() 工作了一段时间。 But, now that the files are bigger, that doesn't even do the trick.但是,既然文件更大了,那甚至都无济于事。 It seems like I need to restart the loop to get it working again.似乎我需要重新启动循环才能让它再次工作。 This modification seems to be working.这个修改似乎有效。

def download(filesToDownload):
    temp = []
    for sFile in filesToDownload:
        for keys in bucket.list(prefix='<bucket>%s' % (sFile)):
            while True:
                try:
                    keys.get_contents_to_filename('%s%s' % (downloadRoot,sFile))
                    temp.append(sFile)
                except:
                    time.sleep(30)
                    x = set(filesToDownload) - set(temp)
                    download(x)
                break

I also ran into this.我也遇到了这个。 I had success adding in a retry.我成功添加了重试。

client = boto3.client(
    's3',
    config=Config(retries={'max_attempts': 3})
)

我曾经遇到过这个问题,修复的是创建一个新的访问密钥,因为旧的被泄露了

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 使用try处理套接字错误,但[Errno 104]会被对等方重置 - Handling socket error with try, except [Errno 104] Connection reset by peer Python 处理 socket.error: [Errno 104] Connection reset by peer - Python handling socket.error: [Errno 104] Connection reset by peer 对等的104管道故障和连接重置 - Broken pipe error and connection reset by peer 104 Python错误104,由同行重置连接 - Python Error 104, connection reset by peer 连接中止。,Django 中的错误(104,“对等连接重置”) - Connection aborted., error(104, 'Connection reset by peer') in Django URLError: <urlopen error [Errno 104] Connection reset by peer> - URLError: <urlopen error [Errno 104] Connection reset by peer> 结构rsync:读取错误:对等连接重置(104) - Fabric rsync: read error: Connection reset by peer (104) MRJob:socket.error:[Errno 104]通过对等方重置连接 - MRJob: socket.error: [Errno 104] Connection reset by peer RabbitMQ pika.exceptions.ConnectionClosed (-1, &quot;error(104, &#39;Connection reset by peer&#39;)&quot;) - RabbitMQ pika.exceptions.ConnectionClosed (-1, "error(104, 'Connection reset by peer')") 如何修复“ConnectionResetError: [Errno 104] Connection reset by peer”错误? - How to fix "ConnectionResetError: [Errno 104] Connection reset by peer" error?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM