简体   繁体   English

如何使用boto以编程方式获取Amazon S3文件的MD5校验和

[英]How to programmatically get the MD5 Checksum of Amazon S3 file using boto

Referred Posts: Amazon S3 & Checksum , How to encode md5 sum into base64 in BASH 推荐帖子: Amazon S3和Checksum如何在BASH中将md5 sum编码为base64

I have to download a tar file from S3 bucket with limited access. 我必须从S3存储桶下载具有有限访问权限的tar文件。 [ Mostly access permissions given only to download ] [主要是仅下载的访问权限]

After I download I have to check the md5 check sum of the downloaded file against the MD5-Check Sum of the data present as metadata in S3 下载后,我必须检查下载文件的md5校验和,以及在S3中作为元数据存在的数据的MD5-Check Sum

I currently use a S3 file browser to manually note the "x-amz-meta-md5" of the content header and validate that value against the computed md5 of the downloaded file. 我目前使用S3文件浏览器手动记录内容标题的“x-amz-meta-md5”,并根据下载文件的计算md5验证该值。

I would like to know if there is programmatic way using boto to capture the md5 hash value of a S3 file as mentioned as metadata. 我想知道是否有编程方式使用boto捕获S3文件的md5哈希值,如元数据所述。

from boto.s3.connection import S3Connection

conn = S3Connection(access_key, secret_key)
bucket=conn.get_bucket("test-bucket")
rs_keys = bucket.get_all_keys()
for key_val in rs_keys:
    print key_val, key_val.**HOW_TO_GET_MD5_FROM_METADATA(?)**

Please correct if my understanding is wrong. 如果我的理解是错误的,请更正。 I am looking for a way to capture the header data programmatically 我正在寻找一种以编程方式捕获标头数据的方法

When boto downloads a file using any of the get_contents_to_* methods, it computes the MD5 checksum of the bytes it downloads and makes that available as the md5 attribute of the Key object. 当boto使用任何get_contents_to_*方法下载文件时,它会计算它下载的字节的MD5校验和,并使其可用作Key对象的md5属性。 In addition, S3 sends an ETag header in the response that represents the server's idea of what the MD5 checksum is. 此外,S3在响应中发送一个ETag标头,表示服务器对MD5校验和的概念。 This is available as the etag attribute of the Key object. 这可用作Key对象的etag属性。 So, after downloading a file you could just compare the value of those two attributes to see if they match. 因此,在下载文件后,您只需比较这两个属性的值即可查看它们是否匹配。

If you want to find out what S3 thinks the MD5 is without actually downloading the file (as shown in your example) you could just do this: 如果你想知道什么S3认为MD5没有实际下载文件(如你的例子所示),你可以这样做:

for key_val in rs_keys:
    print key_val, key_val.etag

It seems well established that the ETag is not the md5sum if the file was assembled after running a multi-part upload. 似乎已经确定,如果文件在运行多部分上载后组装,则ETag不是md5sum。 I think in that case one's only recourse is to download the file and perform a checksum locally. 我认为在这种情况下,唯一的办法就是下载文件并在本地执行校验和。 If the result is correct, the S3 copy must be good. 如果结果正确,则S3副本必须良好。 If the local checksum is wrong, the s3 copy may be bad, or the download might have failed. 如果本地校验和错误,则s3副本可能不正确,或者下载可能已失败。 If you no longer have the original file or a record of its md5sum, I think you're out of luck. 如果你不再拥有原始文件或md5sum的记录,我认为你运气不好。 It would be great if the md5sum of the assembled file were available, or if there were a way to locally compute the expected etag of a file to be uploaded via multipart. 如果组装文件的md5sum可用,或者有一种方法可以本地计算要通过multipart上传的文件的预期etag,那将会很棒。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM