How to get the oldest added object from Amazon S3 Bucket?

Question

I have a folder within an amazon bucket that has some objects.

How to get the oldest added object?

 public FileMetaData Poll()
        {
            var config = new AmazonS3Config();
            config.ServiceURL = "s3.amazonaws.com";
            config.CommunicationProtocol = Protocol.HTTP;

            string bucketName = "bucketname1";
            string accessKey = "accesskey1";
            string secretKey = "secretkey1";

            Amazon.S3.AmazonS3 client = AWSClientFactory.CreateAmazonS3Client(accessKey,
                secretKey,
                config);

            var request = new GetObjectRequest();
            request.WithBucketName(bucketName);

            // how to get the oldest object?

            GetObjectResponse response = client.GetObject(request);

            // todo
            return null;
        }

I have tried the below code which works fine but the problem is that it loads all the objects then finds the oldest which I'd consider a poor practice:

var request = new ListObjectsRequest()
                .WithBucketName(bucketName)
                .WithPrefix(this._folderPath);

            ListObjectsResponse response = client.ListObjects(request);

            S3Object s3Object = response.S3Objects
                .Where(p => !p.Key.EndsWith("_$folder$"))
                .OrderBy(k => k.LastModified).FirstOrDefault();

            var getObjectRequest = new GetObjectRequest()
                .WithBucketName(bucketName)
                .WithKey(s3Object.Key);

            GetObjectResponse getObjectResponse = client.GetObject(getObjectRequest);

            // provider 
            string provider = getObjectResponse.Metadata.Get("x-amz-meta-provider");
            string site = getObjectResponse.Metadata.Get("x-amz-meta-sitename");
            string identifier = s3Object.Key.Remove(0, this._folderPath.Length);
            string xmlData = new StreamReader(getObjectResponse.ResponseStream, true).ReadToEnd();

            return new FileMetaData()
                {
                    Identifier = identifier,
                    Provider = provider,
                    SiteName = site,
                    XmlData = xmlData
                };

Answer 1

Your code seems fine. You only lose a few seconds for the "List Objects Request" but as far as I know it is mandatory.

One problem I do see with your code is that you do not handle the fact that the maximum amount of returned keys per request is 1000. If you may have more keys than that then you have to check if the list is truncated, change the request marker to the next one and issue more requests.

    var request = new ListObjectsRequest()
                    .WithBucketName(bucketName)
                    .WithPrefix(this._folderPath);

    ListObjectsResponse response;
    S3Object s3Object = null;
    do
    {
        response = client.ListObjects(request);
        S3Object tempS3Object = response.S3Objects
            .Where(p => !p.Key.EndsWith("_$folder$"))
            .OrderBy(k => k.LastModified).FirstOrDefault();
        if (s3Object != null)
        {
            if (s3Object.LastModified < tempS3Object.LastModified)
                s3Object = tempS3Object;
        }
        else s3Object = tempS3Object;

        request.Marker = response.NextMarker;
    } while (response.IsTruncated);

    var getObjectRequest = new GetObjectRequest()
        .WithBucketName(bucketName)
        .WithKey(s3Object.Key);

    GetObjectResponse getObjectResponse = client.GetObject(getObjectRequest);

    // provider 
    string provider = getObjectResponse.Metadata.Get("x-amz-meta-provider");
    string site = getObjectResponse.Metadata.Get("x-amz-meta-sitename");
    string identifier = s3Object.Key.Remove(0, this._folderPath.Length);
    string xmlData = new StreamReader(getObjectResponse.ResponseStream, true).ReadToEnd();

    return new FileMetaData()
    {
        Identifier = identifier,
        Provider = provider,
        SiteName = site,
        XmlData = xmlData
    };

Answer 2

it loads all the objects then finds the oldest

Actually, you're not loading all the objects, you're listing them (big difference.) S3 is not a database (and not a filesystem), so you'll have to build your own local index if you want easy access. (or use DynamoDB, SimpleDB, RDS, etc.)

Changing your keys to include the date/time won't help you much . You can use the prefix and delimiter to help narrow down your search. (Ie if each file starts with "YYYY-MM-DD-HHMM" you can set the delimiter to "-" to find the oldest year, then prefix="YYYY-" and delimiter "-" to find the oldest month, etc.)

How to get the oldest added object from Amazon S3 Bucket?

Question

2 answers

solution1
1 2016-05-04 01:18:59

solution2
0 2013-05-07 02:22:25

How to get the oldest added object from Amazon S3 Bucket?

Question

2 answers

solution1 1 2016-05-04 01:18:59

solution2 0 2013-05-07 02:22:25

solution1
1 2016-05-04 01:18:59

solution2
0 2013-05-07 02:22:25