简体   繁体   中英

Amazon S3 - different lifecycle rule for "subdirectory" than for parent "directory"

Let's say I have the following data structure:

  • /
  • /foo
  • /foo/bar
  • /foo/baz

Is it possible to assign to it the following life-cycle rules:

  • / (1 month)
  • /foo (2 months)
  • /foo/bar (3 months)
  • /foo/baz (6 months)

The official documentation is unfortunately self-contradictionary in this regard. It doesn't seem to work with AWS console, which makes me somewhat doubtful that SDKs/REST would be any different;)

Failing at that my root problem is: I have 4 types of projects. The most rudimentary type has a few thousand projects, the other ones have a few dozen. Each type I am obligated to store for a different period of time. Each project contains hundreds of thousands of objects. It looks more or less as:

  • type A, 90% of projects, x storage required
  • type B, 6% of projects, 2x storage required
  • type C, 3% of projects, 4x storage required
  • type D, 1% of projects, 8x storage required

So far so simple. However. Projects may be upgraded or downgraded from one type to another. And as I said - I have a few thousand instances of the first type so I can't write specific rules for every one of them (remember 1000 rule limit per bucket). And since they may upgrade from one type to another I can't simply insert them in a their own folders as well (ex. only projects from a particular type) or bucket. Or so I think? Are there any other options open to me other than iterating over every object, every time I want to purge expired files - which I would seriously rather not do because of the sheer number of objects?

Maybe some kind of file "move/transfer" between buckets that doesn't modify the creation time metadata, and isn't costly for our server to process?

Would be much obliged:)

Lifecycle policies are based on prefix , not "subdirectory."

So if objects matching the foo/ prefix are to be deleted in 2 months, it is not logical to ask for objects with a prefix of foo/bar/ to be deleted in 3 months, because they're going to be deleted after 2 months... since they also match the prefix foo/ . Prefix means prefix. Delimiters are not a factor in lifecycle rules.

Also note that keys and prefixes in S3 do not begin with / . A policy affecting the entire bucket uses the empty string as a prefix, not / .

You do, also, probably want to remember the trailing slashes when you specify prefixes, because foo/bar matches the file foo/bart.jpg while foo/bar/ does not.

Iterating over objects for deletion is not as bad as you make it out to be, since the List Objects API call returns 1000 objects per request (or fewer, if you want), and allows you to specify both prefix and delimiter (usually, you'll use / as the delimiter if you want the responses grouped using the pseudo-folder model the console uses to create the hierarchical display)... and each object's key and datestamp is provided in the response XML. There's also an API request to delete multiple objects in one call.

Any kind of move, transfer, copy, etc. will always reset the creation date of the object. Even modifying the metadata, because objects are immutable. Any time you move, transfer, copy, or "rename" an object (which is actually copy and delete), or modify metadata (which is actually copy to the same key, with different metadata) you are actually creating a new object.

I met the same issue and bypassed it using tags.
This solution would be in two steps:

  1. Use a lambda function to tag object, and link the tag value to the object's prefix
  2. Use the "tag" parameters of your lifecycle rule

Example of lambda function

In your use case, you want for example a 6-month expiration time for the objects with the /foo/baz prefix.
You can write a lambda like similar to this:

import json
import urllib.parse
import boto3
import re

print('Loading function')

s3 = boto3.client('s3')

def lambda_handler(event, context):
    print("Received event: " + json.dumps(event, indent=2))

    #Get the object from the event
    bucket = event['Records'][0]['s3']['bucket']['name']
    key = urllib.parse.unquote_plus(event['Records'][0]['s3']['object']['key'], encoding='utf-8')
    
    tags = {
        "delete_after_six_months": "true" if re.match(pattern=".*\/foo\/baz\/.*", string=key) else "false"
    }
    
    # applies tags
    try:
        response = s3.put_object_tagging(
            Bucket = bucket,
            Key = key,
            Tagging={
                'TagSet': [{'Key': k, 'Value': v} for k, v in tags.items()]
            }
        )
        
    except Exception as e:
        print(e)
        print('Error applying tags to {}'.format(key))
        raise e

The trigger is to be adapted to the user's needs.
Using this, all objects with /foo/baz/ prefix will have a delete_after_six_months: true tag, and you can easily define the proper associated expiration policy.

@Zardii you can use unique s3 object tags [1] for the objects under these prefixes

Then you can apply the life cycle policy by tag with varying retention/deletion period.

[1] https://docs.aws.amazon.com/AmazonS3/latest/dev/object-tagging.html

Prefix - S3 Tags

/ tag=> delete_after_one_month

/foo tag=> delete_after_two_months

/foo/bar tag=> delete_after_three_months

/foo/baz tag=> delete_after_six_month

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM