简体   繁体   English

如何使用Boto Python从S3获取最新文件的最后修改日期?

[英]How to get last modified date of latest file from S3 with Boto Python?

This is structure of my s3 bucket这是我的 s3 存储桶的结构

Bucket 1
    Company A
       File A-02/01/20
       File A-01/01/20
       File B-02/01/20
       File B-01/01/20

    Company B
       File A-02/01/20
       File A-01/01/20

I am trying to go to Bucket 1 >> navigate to company A FOLDER and find the latest version of File A and print the modified date, I wanted to do repeat the same steps for File B and then Company B Folder/File A. I am new to S3 and Boto3 so still learning.我正在尝试转到存储桶 1 >> 导航到 A 公司文件夹并找到文件 A 的最新版本并打印修改日期,我想对文件 B 和 B 公司文件夹/文件 A 重复相同的步骤。我我是 S3 和 Boto3 的新手,所以还在学习。 This is what my code is so far这是我的代码到目前为止

import boto3
from datetime import datetime, timezone

today = datetime.now(timezone.utc)

s3 = boto3.client('s3', region_name='us-east-1')

objects = s3.list_objects(Bucket='Bucket 1',Prefix = 'Company A'+'/File')

for o in objects["Contents"]:
    if o["LastModified"] != today:
        print(o["Key"] +" "+ str(o["LastModified"]))

This prints out the following:这将打印出以下内容:

File A_2019-10-28.csv 2019-11-11 18:31:17+00:00 
File A_2020-01-14.csv 2020-01-14 21:17:46+00:00 
File A_2020-01-28.csv 2020-01-29 19:19:58+00:00

But all I want is check File A_2020-01-28.csv and print if !=today, the same with File B但我想要的只是检查文件 A_2020-01-28.csv 并打印 if !=today,与文件 B 相同

Assuming that "File A" will always have a date at the end, you could use the 'A' part in the Prefix search.假设“文件 A”的末尾总是有一个日期,您可以在前缀搜索中使用“A”部分。 One thing to keep in mind with S3 is that there is no such thing as folders . S3 要记住的一件事是没有文件夹这样的东西。 That is something you imply by using '/' in they key name.这是您在键名中使用“/”所暗示的。 S3 just works on Buckets/Keys. S3 仅适用于存储桶/密钥。

The latest version of that file would be the the version that has the newest last_modified field.该文件的最新版本将是具有最新last_modified字段的版本。 One approach is to sort the object list (of "A" files) on that attribute:一种方法是根据该属性对(“A”文件的)对象列表进行排序:

from operator import attrgetter

objs = s3.Bucket('Bucket 1').objects.filter(Prefix='Company A/File A')

# sort the objects based on 'obj.last_modified'
sorted_objs = sorted(objs, key=attrgetter('last_modified'))

# The latest version of the file (the last one in the list)
latest = sorted_objs.pop()

As an example: I created foo1.txt, foo2.txt, foo3.txt in order.例如:我按顺序创建了 foo1.txt、foo2.txt、foo3.txt。 Then foo10.txt, foo5.txt.然后是 foo10.txt、foo5.txt。 foo5.txt is my latest "foo" file. foo5.txt 是我最新的“foo”文件。

>>> b.upload_file('/var/tmp/foo.txt','foo10.txt')
>>> b.upload_file('/var/tmp/foo.txt','foo5.txt')
>>> [i.key for i in b.objects.all()]  ## no ordering
['foo.txt', 'foo10.txt', 'foo2.txt', 'foo3.txt', 'foo5.txt']
>>> f2 = sorted(b.objects.all(), key=attrgetter('last_modified'))
>>> f2
[s3.ObjectSummary(bucket_name='foobar', key='foo.txt'), s3.ObjectSummary(bucket_name='foobar', key='foo2.txt'), s3.ObjectSummary(bucket_name='foobar', key='foo3.txt'), s3.ObjectSummary(bucket_name='foobar', key='foo10.txt'), s3.ObjectSummary(bucket_name='foobar', key='foo5.txt')]
>>> f2.pop()
s3.ObjectSummary(bucket_name='foobar', key='foo5.txt')

For more details on Python sorting see: https://wiki.python.org/moin/HowTo/Sorting有关 Python 排序的更多详细信息,请参阅: https : //wiki.python.org/moin/HowTo/Sorting

Almost there, however the if statement compares 2 different datetime objects which contain date AND time - the time will differ.几乎在那里,但是if语句比较了包含日期和时间的 2 个不同的datetime对象 - 时间会有所不同。 If you are after the dates only then change the if to:如果您只在日期之后, if更改为:

    if o["LastModified"].date() != today.date():

Works on Python 3.6.9.适用于 Python 3.6.9。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM