简体   繁体   中英

How to programmatically get the MD5 Checksum of Amazon S3 file using boto

Referred Posts: Amazon S3 & Checksum , How to encode md5 sum into base64 in BASH

I have to download a tar file from S3 bucket with limited access. [ Mostly access permissions given only to download ]

After I download I have to check the md5 check sum of the downloaded file against the MD5-Check Sum of the data present as metadata in S3

I currently use a S3 file browser to manually note the "x-amz-meta-md5" of the content header and validate that value against the computed md5 of the downloaded file.

I would like to know if there is programmatic way using boto to capture the md5 hash value of a S3 file as mentioned as metadata.

from boto.s3.connection import S3Connection

conn = S3Connection(access_key, secret_key)
bucket=conn.get_bucket("test-bucket")
rs_keys = bucket.get_all_keys()
for key_val in rs_keys:
    print key_val, key_val.**HOW_TO_GET_MD5_FROM_METADATA(?)**

Please correct if my understanding is wrong. I am looking for a way to capture the header data programmatically

When boto downloads a file using any of the get_contents_to_* methods, it computes the MD5 checksum of the bytes it downloads and makes that available as the md5 attribute of the Key object. In addition, S3 sends an ETag header in the response that represents the server's idea of what the MD5 checksum is. This is available as the etag attribute of the Key object. So, after downloading a file you could just compare the value of those two attributes to see if they match.

If you want to find out what S3 thinks the MD5 is without actually downloading the file (as shown in your example) you could just do this:

for key_val in rs_keys:
    print key_val, key_val.etag

It seems well established that the ETag is not the md5sum if the file was assembled after running a multi-part upload. I think in that case one's only recourse is to download the file and perform a checksum locally. If the result is correct, the S3 copy must be good. If the local checksum is wrong, the s3 copy may be bad, or the download might have failed. If you no longer have the original file or a record of its md5sum, I think you're out of luck. It would be great if the md5sum of the assembled file were available, or if there were a way to locally compute the expected etag of a file to be uploaded via multipart.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM