简体   繁体   中英

How to get last modified date of latest file from S3 with Boto Python?

This is structure of my s3 bucket

Bucket 1
    Company A
       File A-02/01/20
       File A-01/01/20
       File B-02/01/20
       File B-01/01/20

    Company B
       File A-02/01/20
       File A-01/01/20

I am trying to go to Bucket 1 >> navigate to company A FOLDER and find the latest version of File A and print the modified date, I wanted to do repeat the same steps for File B and then Company B Folder/File A. I am new to S3 and Boto3 so still learning. This is what my code is so far

import boto3
from datetime import datetime, timezone

today = datetime.now(timezone.utc)

s3 = boto3.client('s3', region_name='us-east-1')

objects = s3.list_objects(Bucket='Bucket 1',Prefix = 'Company A'+'/File')

for o in objects["Contents"]:
    if o["LastModified"] != today:
        print(o["Key"] +" "+ str(o["LastModified"]))

This prints out the following:

File A_2019-10-28.csv 2019-11-11 18:31:17+00:00 
File A_2020-01-14.csv 2020-01-14 21:17:46+00:00 
File A_2020-01-28.csv 2020-01-29 19:19:58+00:00

But all I want is check File A_2020-01-28.csv and print if !=today, the same with File B

Assuming that "File A" will always have a date at the end, you could use the 'A' part in the Prefix search. One thing to keep in mind with S3 is that there is no such thing as folders . That is something you imply by using '/' in they key name. S3 just works on Buckets/Keys.

The latest version of that file would be the the version that has the newest last_modified field. One approach is to sort the object list (of "A" files) on that attribute:

from operator import attrgetter

objs = s3.Bucket('Bucket 1').objects.filter(Prefix='Company A/File A')

# sort the objects based on 'obj.last_modified'
sorted_objs = sorted(objs, key=attrgetter('last_modified'))

# The latest version of the file (the last one in the list)
latest = sorted_objs.pop()

As an example: I created foo1.txt, foo2.txt, foo3.txt in order. Then foo10.txt, foo5.txt. foo5.txt is my latest "foo" file.

>>> b.upload_file('/var/tmp/foo.txt','foo10.txt')
>>> b.upload_file('/var/tmp/foo.txt','foo5.txt')
>>> [i.key for i in b.objects.all()]  ## no ordering
['foo.txt', 'foo10.txt', 'foo2.txt', 'foo3.txt', 'foo5.txt']
>>> f2 = sorted(b.objects.all(), key=attrgetter('last_modified'))
>>> f2
[s3.ObjectSummary(bucket_name='foobar', key='foo.txt'), s3.ObjectSummary(bucket_name='foobar', key='foo2.txt'), s3.ObjectSummary(bucket_name='foobar', key='foo3.txt'), s3.ObjectSummary(bucket_name='foobar', key='foo10.txt'), s3.ObjectSummary(bucket_name='foobar', key='foo5.txt')]
>>> f2.pop()
s3.ObjectSummary(bucket_name='foobar', key='foo5.txt')

For more details on Python sorting see: https://wiki.python.org/moin/HowTo/Sorting

Almost there, however the if statement compares 2 different datetime objects which contain date AND time - the time will differ. If you are after the dates only then change the if to:

    if o["LastModified"].date() != today.date():

Works on Python 3.6.9.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM