简体繁体中英

Why do you need to immediately verify checksums when uploading or downloading from an object storage system?

原文 2020-04-24 17:19:13 6 2 amazon-s3/ google-cloud-storage/ object-storage

Object storage systems like AWS S3 and Google Cloud Storage discuss the need to check the integrity of downloaded and uploaded objects, immediately after transfer, to ensure no corruption occurred.

For example, the AWS CLI doc mentions:

Upload: The AWS CLI will calculate and auto-populate the Content-MD5 header for both standard and multipart uploads. If the checksum that S3 calculates does not match the Content-MD5 provided, S3 will not store the object and instead will return an error message back the AWS CLI.

Download: The AWS CLI will attempt to verify the checksum of downloads when possible, based on the ETag header returned from a GetObject request that's performed whenever the AWS CLI downloads objects from S3. If the calculated MD5 checksum does not match the expected checksum, the file is deleted and the download is retried.

Given that TCP incorporates automatic integrity checking, why do these systems require an additional checksum to verify integrity? It seems like by using TCP we should be able to ensure corruption did not occur in transfer.

2 answers

There could be many reasons, but the first one that comes to mind is verification that the client's payload didn't get corrupted (or maliciously modified) while it was reading from the data source, before the data actually got transferred. Similarly, there could be corruption writing to the storage on the cloud end. Using a checksum on both ends is a way to hedge against that, even if it's highly unlikely.

TCP (and UDP) checksums will not always protect you, and they have been known to be weak for years (if not decades), a quick search yield these, I am sure you can find other (and maybe better) references:

https://www.evanjones.ca/tcp-and-ethe.net-checksums-fail.html https://www.evanjones.ca/tcp-checksums.html

That is not the only reason, you may experience data corruption in your local disk, or your CPU may be corrupting bits when encrypting the data (it does happen), or some other uncommon problem.

More generally, all these systems are designed to handle the corner cases and odd situations which happen so rarely that most of us will not experience them in decades. But because these systems are used by so many people, and read/write so many bytes that they experience them daily. In other words: rare events at large enough scale happen often.

How do you obtain the OBJECT_LOCATION when uploading objects to Google Cloud?

Why is gcloud app deploy uploading 0 files to google cloud storage?

Firebase user is authenticated, so why is uploading to storage denied?

Why do I need a bucket policy for uploading to s3 via presigned url?

Downloading folders from Google Cloud Storage Bucket

Uploading data from firebase functions to firebase storage?

Get response code after uploading object to google cloud storage

Why do we need to disambiguate when adding an immediate value to a value at a memory address

Firebase verify post dimensions before uploading

How to specify a storage_class when uploading to AWS S3 with Ruby?

暂无

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

Related Question How do you obtain the OBJECT_LOCATION when uploading objects to Google Cloud? Why is gcloud app deploy uploading 0 files to google cloud storage? Firebase user is authenticated, so why is uploading to storage denied? Why do I need a bucket policy for uploading to s3 via presigned url? Downloading folders from Google Cloud Storage Bucket Uploading data from firebase functions to firebase storage? Get response code after uploading object to google cloud storage Why do we need to disambiguate when adding an immediate value to a value at a memory address Firebase verify post dimensions before uploading How to specify a storage_class when uploading to AWS S3 with Ruby?

Related Tags

Why do you need to immediately verify checksums when uploading or downloading from an object storage system?

Question

2 answers

solution1
1 2020-04-24 17:23:52

solution2
1 2020-04-25 14:07:42

Why do you need to immediately verify checksums when uploading or downloading from an object storage system?

Question

2 answers

solution1 1 2020-04-24 17:23:52

solution2 1 2020-04-25 14:07:42

solution1
1 2020-04-24 17:23:52

solution2
1 2020-04-25 14:07:42