简体   繁体   中英

Silent failure of s3multiput (boto) upload to s3 from EC2 instance

I'm trying to automate a process that collects data on one (or more) AWS instance(s), uploads the data to S3 hourly, to be retrieved by a decoupled process for parsing and further action. As a first step, I whipped up some crontab-initiated shell script (running in Ubuntu 12.04 LTS) that calls the boto utility s3multiput.

For the most part, this works fine, but very occasionally (maybe once a week) the file fails to appear in the s3 bucket, and I can't see any error or exception thrown to track down why.

I'm using the s3multiput utility included with boto 2.6.0. Python 2.7.3 is the default python on the instance. I have an IAM Role assigned to the instance to provide AWS credentials to boto.

I have a crontab calling a script that calls a wrapper that calls s3multiput. I included the -d 1 flag on the s3multiput call, and redirected all output on the crontab job with 2>&1 but the report for the hour that's missing data looks just like the report for the hour before and the hour after, each of which succeeded.

So, 99% of the time this works, but when it fails I don't know why and I'm having trouble figuring where to look. I only find out about the failure later when the parser job tries to pull the data from the bucket and it's not there. The data is safe and sound in the directory it should have uploaded from, so I can do it manually, but would rather not have to.

I'm happy to post the ~30-40 lines of related code if helpful, but wondered if anybody else had run into this and it sounded familiar.

Some grand day I'll come back to this part of the pipeline and rewrite it in python to obviate s3multiput, but we just don't have dev time for that yet.

How can I investigate what's going wrong here with the s3multiput upload?

First, I would try updating boto; a commit to the development branch mentions logging when a multipart upload fails. Note that doing so will require using s3put instead, as s3multiput is being folded into s3put .

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM