如何通过python只获取在S3位置创建/修改的最新文件/文件

Question

using boto i tried the below code : 使用boto我尝试了以下代码：

from boto.s3.connection import S3Connection
conn = S3Connection('XXX', 'YYYY')

bucket = conn.get_bucket('myBucket')

file_list = bucket.list('just/a/prefix/')

but am unable to get the length of the list or the last element of the file_list as it is a BucketListResultSet type ,please suggest a solution for this scenario 但我无法获取列表的长度或file_list的最后一个元素，因为它是BucketListResultSet类型，请为此方案建议一个解决方案

Answer 1

You are trying to use boto library, which is rather obsolete and not maintained. 您正在尝试使用boto库，该库已过时且未得到维护。 The number of issues with this library is growing. 该库的问题数量正在增加。

Better use currently developed boto3 . 更好地使用目前开发的boto3 。

First, let us define parameters of our search: 首先，让我们定义搜索参数：

>>> bucket_name = "bucket_of_m"
>>> prefix = "region/cz/"

Do import boto3 and create s3 representing S3 resource: 导入boto3并创建代表S3资源的s3：

>>> import boto3
>>> s3 = boto3.resource("s3")

Get the bucket: 获取桶：

>>> bucket = s3.Bucket(name=bucket_name)
>>> bucket
s3.Bucket(name='bucket_of_m')

Define filter for objects with given prefix: 为具有给定前缀的对象定义过滤器：

>>> res = bucket.objects.filter(Prefix=prefix)
>>> res
s3.Bucket.objectsCollection(s3.Bucket(name='bucket_of_m'), s3.ObjectSummary)

and iterate over it: 并迭代它：

>>> for obj in res:
...     print obj.key
...     print obj.size
...     print obj.last_modified
...

Each obj is ObjectSummary (not Object itself), but it holds enought to learn something about it 每个obj都是ObjectSummary（而不是Object本身），但它仍然需要了解它

>>> obj
s3.ObjectSummary(bucket_name='bucket_of_m', key=u'region/cz/Ostrava/Nadrazni.txt')
>>> type(obj)
boto3.resources.factory.s3.ObjectSummary

You can get Object from it and use it as you need: 您可以从中获取Object并根据需要使用它：

>>> o = obj.Object()
>>> o
s3.Object(bucket_name='bucket_of_m', key=u'region/cz/rodos/fusion/AdvancedDataFusion.xml')

There are not so many options for filtering, but prefix is available. 过滤的选项并不多，但前缀可用。

Answer 2

As an addendum to Jan's answer : 作为Jan答案的附录：

Seems that the boto3 library has changed in the meantime and currently (version 1.6.19 at the time of writing) offers more parameters for the filter method : 似乎boto3库在此期间发生了变化，目前（编写本文时为1.6.19版）为filter方法提供了更多参数：

 object_summary_iterator = bucket.objects.filter( Delimiter='string', EncodingType='url', Marker='string', MaxKeys=123, Prefix='string', RequestPayer='requester' )

Three useful parameters to limit the number of entries for your scenario are Marker , MaxKeys and Prefix : 限制场景条目数的三个有用参数是Marker ， MaxKeys和Prefix ：

Marker ( string ) -- Specifies the key to start with when listing objects in a bucket. 标记（ 字符串 ） - 指定在存储桶中列出对象时要开始的键。
MaxKeys ( integer ) -- Sets the maximum number of keys returned in the response. MaxKeys （整数） - 设置响应中返回的最大键数。 The response might contain fewer keys but will never contain more. 响应可能包含较少的键，但永远不会包含更多键。
Prefix ( string ) -- Limits the response to keys that begin with the specified prefix. Prefix （ string ） - 限制对以指定前缀开头的键的响应。

Two notes: 两个笔记：

The key you specify for Marker will not be included in the result, ie the listing starts from the key following the one you specify as Marker. 您指定标记的密钥将不被包括在结果，即从上市下列指定为标记的一个关键开始。
The boto3 library is performing automatic pagination on the results. boto3库正在对结果执行自动分页。 The size of each page is determined by the MaxKeys parameter of the filter function (defaulting to 1000). 每个页面的大小由过滤器函数的MaxKeys参数确定（默认为1000）。
If you iterate over the s3.Bucket.objectsCollection object for more than that, it will automatically download the next page. 如果迭代s3.Bucket.objectsCollection对象超过它，它将自动下载下一页。 While this is generally useful, it might be surprising when you specify eg MaxKeys=10 and want to iterate only over the 10 keys, yet the iterator will go over all matched keys, just with a new request to server each 10 keys. 虽然这通常很有用，但是当你指定例如MaxKeys=10并且只想迭代10个键时，它可能会令人惊讶，但是迭代器将遍历所有匹配的键，只需要为每个10个键提供服务器的新请求。
So, if you just want eg the first three results, break off the loop manually, don't rely on the iterator . 所以， 如果你只想要例如前三个结果，手动中断循环，不要依赖迭代器 。
(Unfortunately this is not clear in the docs (it's actually quite wrong), as the library parameter description is copied from the API parameter description , where it actually makes sense: " The response might contain fewer keys but will never contain more. ") （不幸的是，这在文档中并不清楚（实际上是非常错误的），因为库参数描述是从API参数描述复制的，实际上它有意义：“ 响应可能包含更少的键，但永远不会包含更多。 ”）

如何通过python只获取在S3位置创建/修改的最新文件/文件

问题描述

2 个解决方案

解决方案1
5 已采纳 2016-03-31 20:47:41

解决方案2
1 2018-03-30 10:22:18

如何通过python只获取在S3位置创建/修改的最新文件/文件

问题描述

2 个解决方案

解决方案1 5 已采纳 2016-03-31 20:47:41

解决方案2 1 2018-03-30 10:22:18

解决方案1
5 已采纳 2016-03-31 20:47:41

解决方案2
1 2018-03-30 10:22:18