[英]How can I get only the latest file/files created/modified on S3 location through python
using boto i tried the below code : 使用boto我尝试了以下代码:
from boto.s3.connection import S3Connection
conn = S3Connection('XXX', 'YYYY')
bucket = conn.get_bucket('myBucket')
file_list = bucket.list('just/a/prefix/')
but am unable to get the length of the list or the last element of the file_list as it is a BucketListResultSet type ,please suggest a solution for this scenario 但我无法获取列表的长度或file_list的最后一个元素,因为它是BucketListResultSet类型,请为此方案建议一个解决方案
You are trying to use boto
library, which is rather obsolete and not maintained. 您正在尝试使用
boto
库,该库已过时且未得到维护。 The number of issues with this library is growing. 该库的问题数量正在增加。
Better use currently developed boto3
. 更好地使用目前开发的
boto3
。
First, let us define parameters of our search: 首先,让我们定义搜索参数:
>>> bucket_name = "bucket_of_m"
>>> prefix = "region/cz/"
Do import boto3
and create s3 representing S3 resource: 导入
boto3
并创建代表S3资源的s3:
>>> import boto3
>>> s3 = boto3.resource("s3")
Get the bucket: 获取桶:
>>> bucket = s3.Bucket(name=bucket_name)
>>> bucket
s3.Bucket(name='bucket_of_m')
Define filter for objects with given prefix: 为具有给定前缀的对象定义过滤器:
>>> res = bucket.objects.filter(Prefix=prefix)
>>> res
s3.Bucket.objectsCollection(s3.Bucket(name='bucket_of_m'), s3.ObjectSummary)
and iterate over it: 并迭代它:
>>> for obj in res:
... print obj.key
... print obj.size
... print obj.last_modified
...
Each obj
is ObjectSummary (not Object itself), but it holds enought to learn something about it 每个
obj
都是ObjectSummary(而不是Object本身),但它仍然需要了解它
>>> obj
s3.ObjectSummary(bucket_name='bucket_of_m', key=u'region/cz/Ostrava/Nadrazni.txt')
>>> type(obj)
boto3.resources.factory.s3.ObjectSummary
You can get Object from it and use it as you need: 您可以从中获取Object并根据需要使用它:
>>> o = obj.Object()
>>> o
s3.Object(bucket_name='bucket_of_m', key=u'region/cz/rodos/fusion/AdvancedDataFusion.xml')
There are not so many options for filtering, but prefix is available. 过滤的选项并不多,但前缀可用。
As an addendum to Jan's answer : 作为Jan答案的附录:
Seems that the boto3 library has changed in the meantime and currently (version 1.6.19 at the time of writing) offers more parameters for the filter
method : 似乎boto3库在此期间发生了变化,目前(编写本文时为1.6.19版) 为
filter
方法提供了更多参数 :
object_summary_iterator = bucket.objects.filter( Delimiter='string', EncodingType='url', Marker='string', MaxKeys=123, Prefix='string', RequestPayer='requester' )
Three useful parameters to limit the number of entries for your scenario are Marker
, MaxKeys
and Prefix
: 限制场景条目数的三个有用参数是
Marker
, MaxKeys
和Prefix
:
Marker ( string ) -- Specifies the key to start with when listing objects in a bucket.
标记 ( 字符串 ) - 指定在存储桶中列出对象时要开始的键。
MaxKeys ( integer ) -- Sets the maximum number of keys returned in the response.MaxKeys ( 整数 ) - 设置响应中返回的最大键数。 The response might contain fewer keys but will never contain more.
响应可能包含较少的键,但永远不会包含更多键。
Prefix ( string ) -- Limits the response to keys that begin with the specified prefix.Prefix ( string ) - 限制对以指定前缀开头的键的响应。
Two notes: 两个笔记:
The key you specify for Marker will not be included in the result, ie the listing starts from the key following the one you specify as Marker. 您指定标记的密钥将不被包括在结果,即从上市下列指定为标记的一个关键开始。
The boto3 library is performing automatic pagination on the results. boto3库正在对结果执行自动分页 。 The size of each page is determined by the MaxKeys parameter of the filter function (defaulting to 1000).
每个页面的大小由过滤器函数的MaxKeys参数确定(默认为1000)。
If you iterate over the s3.Bucket.objectsCollection
object for more than that, it will automatically download the next page. 如果迭代
s3.Bucket.objectsCollection
对象超过它,它将自动下载下一页。 While this is generally useful, it might be surprising when you specify eg MaxKeys=10
and want to iterate only over the 10 keys, yet the iterator will go over all matched keys, just with a new request to server each 10 keys. 虽然这通常很有用,但是当你指定例如
MaxKeys=10
并且只想迭代10个键时,它可能会令人惊讶,但是迭代器将遍历所有匹配的键,只需要为每个10个键提供服务器的新请求。
So, if you just want eg the first three results, break off the loop manually, don't rely on the iterator . 所以, 如果你只想要例如前三个结果,手动中断循环,不要依赖迭代器 。
(Unfortunately this is not clear in the docs (it's actually quite wrong), as the library parameter description is copied from the API parameter description , where it actually makes sense: " The response might contain fewer keys but will never contain more. ") (不幸的是,这在文档中并不清楚(实际上是非常错误的),因为库参数描述是从API参数描述复制的,实际上它有意义:“ 响应可能包含更少的键,但永远不会包含更多。 ”)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.