简体   繁体   中英

Streaming files from amazon s3 with seek possibility in C#

I need to work with huge files in Amazon S3. How can I get part of huge file from S3? Best way would be get stream with the seek possibility. Unfortunately, CanSeek property of response.ResponseStream is false:

GetObjectRequest request = new GetObjectRequest();
request.BucketName = BUCKET_NAME;
request.Key = NumIdToAmazonKey(numID);
GetObjectResponse response = client.GetObject(request);

You could do following to read a certain part of your file

GetObjectRequest request = new GetObjectRequest 
{
    BucketName = bucketName,
    Key = keyName,
    ByteRange = new ByteRange(0, 10)
};

See the documentation

I know this isn't exactly what OP is asking for but I needed a seekable s3 stream so I could read Parquet files without downloading them so I gave this a shot here: https://github.com/mukunku/RandomHelpers/blob/master/SeekableS3Stream.cs

Performance wasn't as bad as I expected. You can use the TimeWastedSeeking property to see how much time is being wasted by allowing Seek() on an s3 stream.

Here's an example on how to use it:

using (var client = new AmazonS3Client(credentials, Amazon.RegionEndpoint.USEast1))
{
    using (var stream = SeekableS3Stream.OpenFile(client, "myBucket", "path/to/myfile.txt", true))
    {
        //stream is seekable!
    }
}

After a frustrating afternoon with the same problem I found the static class AmazonS3Util https://docs.aws.amazon.com/sdkfornet/v3/apidocs/items/S3/TS3Util.html

Which has a MakeStreamSeekable method.

Way late for the OP, but I've just posted an article and code demonstration of a SeekableS3Stream that performs reasonably well in real-world use cases.

https://github.com/mlhpdx/seekable-s3-stream

Specifically, I demonstrate reading a single small file from a much larger ISO disk image using the DiscUtils library unmodified by implementing a random-access stream that uses Range requests to pull sections of the file as-needed and maintains them in an MRU list to prevent re-downloading ranges for hot data structures in the file (eg zip central directory records).

The use is similarly simple:

using System;
using System.IO;
using System.Threading.Tasks;
using Amazon.S3;
using DiscUtils.Iso9660;

namespace Seekable_S3_Stream
{
    class Program
    {
        const string BUCKET = "rds.nsrl.nist.gov";
        const string KEY = "RDS/current/RDS_ios.iso"; // "RDS/current/RDS_modern.iso";
        const string FILENAME = "READ_ME.TXT";
        static async Task Main(string[] args)
        {
            var s3 = new AmazonS3Client();

            using var stream = new Cppl.Utilities.AWS.SeekableS3Stream(s3, BUCKET, KEY, 1 * 1024 * 1024, 4);
            using var iso = new CDReader(stream, true);
            using var file = iso.OpenFile(FILENAME, FileMode.Open, FileAccess.Read);
            using var reader = new StreamReader(file);
            var content = await reader.ReadToEndAsync();

            await Console.Out.WriteLineAsync($"{stream.TotalRead / (float)stream.Length * 100}% read, {stream.TotalLoaded / (float)stream.Length * 100}% loaded");
        }
    }
}

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM