如何使用Python中的Boto检查文件是否已完成上传到S3 Bucket？

Question

我正在尝试使用Boto将图像上传到S3存储桶中。 图像成功上传后，我想使用S3存储桶中图像的文件URL进行某些操作。 问题在于，有时图像上传速度不够快，并且当我想执行依赖于图像文件URL的操作时，服务器最终报错。

这是我的源代码。 我正在使用python flask。

def search_test(consumer_id):

consumer = session.query(Consumer).filter_by(consumer_id=consumer_id).one()
products = session.query(Product).all()
product_dictionary = {'Products': [p.serialize for p in products]}

if request.method == 'POST':
    p_product_image_url = request.files['product_upload_url']
    s3 = boto.connect_s3(AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY)
    bucket = s3.get_bucket(AWS_BUCKET_NAME)
    k = Key(bucket)
    if p_product_image_url and allowed_file(p_product_image_url.filename):

        # Read the contents of the file
        file_content = p_product_image_url.read()

        # Use Boto to upload the file to S3
        k.set_metadata('Content-Type', mimetypes.guess_type(p_product_image_url.filename))
        k.key = secure_filename(p_product_image_url.filename)
        k.set_contents_from_string(file_content)
        print ('consumer search upload successful')

    new_upload = Uploads(picture_upload_url=k.key.replace(' ', '+'), consumer=consumer)
    session.add(new_upload)
    session.commit()

    new_result = jsonify(Result=perform_actual_search(amazon_s3_base_url + k.key.replace(' ', '+'),

                                                      product_dictionary))

    return new_result
else:
    return render_template('upload_demo.html', consumer_id=consumer_id)

jsonify方法需要一个有效的图像URL来执行该操作。 有时会起作用，有时却不会。 我怀疑这是由于该图像在执行该行代码时尚未上传的问题所致。

perform_actual_search方法如下：

def get_image_search_results(image_url):
global description
url = ('http://style.vsapi01.com/api-search/by-url/?apikey=%s&url=%s' % (just_visual_api_key, image_url))
h = httplib2.Http()
response, content = h.request(url,
                              'GET')  # alternatively write content=h.request((url,'GET')[1]) ///Numbr 2 in our array
result = json.loads(content)

result_dictionary = []

for i in range(0, 10):
    if result:
        try:
            if result['errorMessage']:
                result_dictionary = []
        except:
            pass

            if result['images'][i]:
                images = result['images'][i]
                jv_img_url = images['imageUrl']
                title = images['title']
                try:
                    if images['description']:
                        description = images['description']
                    else:
                        description = "no description"
                except:
                    pass

                # print("\njv_img_url: %s,\ntitle: %s,\ndescription: %s\n\n"% (
                # jv_img_url, title, description))

                image_info = {
                    'image_url': jv_img_url,
                    'title': title,
                    'description': description,
                }
                result_dictionary.append(image_info)

if result_dictionary != []:
    # for i in range(len(result_dictionary)):
    #     print (result_dictionary[i])
    #     print("\n\n")
    return result_dictionary
else:
    return []


def performSearch(jv_input_dictionary, imagernce_products_dict):
print jv_input_dictionary
print imagernce_products_dict

global common_desc_ratio
global isReady
image_search_results = []
if jv_input_dictionary != []:
    for i in range(len(jv_input_dictionary)):
        print jv_input_dictionary[i]
        for key in jv_input_dictionary[i]:
            if key == 'description':
                input_description = jv_input_dictionary[i][key]
                s1w = re.findall('\w+', input_description.lower())
                s1count = Counter(s1w)
                print input_description
                for j in imagernce_products_dict:
                    if j == 'Products':
                        for q in range(len(imagernce_products_dict['Products'])):
                            for key2 in imagernce_products_dict['Products'][q]:
                                if key2 == 'description':
                                    search_description = imagernce_products_dict['Products'][q]['description']
                                    print search_description
                                    s2w = re.findall('\w+', search_description.lower())
                                    s2count = Counter(s2w)
                                    # Commonality magic
                                    common_desc_ratio = difflib.SequenceMatcher(None, s1w, s2w).ratio()
                                    print('Common ratio is: %.2f' % common_desc_ratio)

                            if common_desc_ratio > 0.09:
                                image_search_results.append(imagernce_products_dict['Products'][q])

if image_search_results:

    print image_search_results
    return image_search_results
else:
    return {'404': 'No retailers registered with us currently own this product.'}


def perform_actual_search(image_url, imagernce_product_dictionary):
return performSearch(get_image_search_results(image_url), imagernce_product_dictionary)

任何解决此问题的帮助将不胜感激。

Answer 1

我将S3配置为生成有关s3：ObjectCreated：*等事件的通知

通知可以发布到SNS主题，SQS队列或直接触发lambda函数。

有关S3通知的更多详细信息： http : //docs.aws.amazon.com/AmazonS3/latest/dev/NotificationHowTo.html

您应该重写代码以将上传部分和图像处理部分分开。 后者可以在Python中实现为Lambda函数。
在这里，以异步方式工作是关键，编写阻塞代码通常是不可扩展的。

Answer 2

您可以将写入s3的字节与文件大小进行比较。 假设您使用以下方法写入s3：

bytes_written = key.set_contents_from_file(file_binary, rewind=True)在您的情况下是set_contents_from_string

然后我会比较， bytes_written与p_product_image_url.seek(0, os.SEEK_END)

如果他们匹配。 整个文件已上传到s3。

如何使用Python中的Boto检查文件是否已完成上传到S3 Bucket？

问题描述

2 个解决方案

解决方案1
2 已采纳 2016-04-21 05:15:22

解决方案2
1 2016-04-21 01:30:58

如何使用Python中的Boto检查文件是否已完成上传到S3 Bucket？

问题描述

2 个解决方案

解决方案1 2 已采纳 2016-04-21 05:15:22

解决方案2 1 2016-04-21 01:30:58

解决方案1
2 已采纳 2016-04-21 05:15:22

解决方案2
1 2016-04-21 01:30:58