简体   繁体   English

无法使用 boto3 在 Scaleway 上列出存储桶对象

[英]Can't list bucket objects on Scaleway using boto3

I saw a few similar posts, but unfortunately none helped me.我看到了一些类似的帖子,但不幸的是没有一个帮助我。 I have an s3 bucket (on scaleway), and I'm trying to simply list all objects contained in that bucket, using boto3 s3 client as follow:我有一个 s3 存储桶(在 scaleway 上),我正在尝试使用 boto3 s3 客户端简单地列出该存储桶中包含的所有对象,如下所示:

s3 = boto3.client('s3',
                  region_name=AWS_S3_REGION_NAME,
                  endpoint_url=AWS_S3_ENDPOINT_URL,
                  aws_access_key_id=AWS_ACCESS_KEY_ID,
                  aws_secret_access_key=AWS_SECRET_ACCESS_KEY
                  )

all_objects = s3.list_objects_v2(Bucket=AWS_STORAGE_BUCKET_NAME)

This simple piece of code responds with an error:这段简单的代码响应错误:

botocore.errorfactory.NoSuchKey: An error occurred (NoSuchKey) when calling the ListObjects operation: The specified key does not exist.

First, the error seems inapropriate to me since I'm not specifying any key to search.首先,该错误对我来说似乎不合适,因为我没有指定任何要搜索的键。 I also tried to pass a Prefix argument to this method to narrow down the search to a specific subdirectory, same error.我还尝试将Prefix参数传递给此方法以将搜索范围缩小到特定的子目录,同样的错误。 Second, I tried to achieve the same thing using boto3 Resource rather than Client , as follow:其次,我尝试使用 boto3 Resource而不是Client来实现相同的目的,如下所示:

session = boto3.Session(
    region_name=AWS_S3_REGION_NAME,
    aws_access_key_id=AWS_ACCESS_KEY_ID,
    aws_secret_access_key=AWS_SECRET_ACCESS_KEY
)
resource = session.resource(
    's3',
    endpoint_url=AWS_S3_ENDPOINT_URL,
)

for bucket in resource.buckets.all():
    print(bucket.name)

That code produces absolutely nothing.该代码绝对不会产生任何结果。 One weird thing that strikes me is that I don't pass the bucket_name anywhere here, which seems to be normal according to aws documentation令我bucket_name一件奇怪的事情是,我没有在bucket_name任何地方传递bucket_name ,根据aws 文档,这似乎是正常的

There's no chance that I misconfigured the client, since I'm able to use the put_object method perfectly with that same client.我不可能错误地配置客户端,因为我可以在同一个客户端上完美地使用put_object方法。 One strange though: when I want to put a file, I pass the whole path to put_object as Key (as I found it to be the way to go), but the object is inserted with the bucket name prepend to it.一个奇怪的是:当我想放置一个文件时,我将整个路径传递给put_object作为Key (因为我发现它是要走的路),但是插入的对象带有存储桶名称。 So let's say I call put_object(Key='/path/to/myfile.ext') , the object will end up to be /bucket-name/path/to/myfile.ext .所以假设我调用put_object(Key='/path/to/myfile.ext') ,对象最终将是/bucket-name/path/to/myfile.ext

Is this strange behavior the key to my problem ?这种奇怪的行为是我问题的关键吗? How can I investigate what's happening, or is there another way I could try to list bucket files ?我如何调查正在发生的事情,或者是否有另一种方法可以尝试列出存储桶文件?

Thank you谢谢

EDIT: So, after logging the request that boto3 client is sending, I noticed that the bucket name is append to the url, so instead of requesting https://<bucket_name>.s3.<region>.<provider>/ , it requests https://<bucket_name>.s3.<region>.<provider>/<bucket-name>/ , which is leading to the NoSuchKey error.编辑:因此,在记录boto3客户端发送的请求后,我注意到存储桶名称附加到 url,因此不是请求https://<bucket_name>.s3.<region>.<provider>/ ,而是请求https://<bucket_name>.s3.<region>.<provider>/<bucket-name>/ ,这会导致NoSuchKey错误。 I took a look into the botocore library, and I found this:我查看了botocore库,发现了这个:

url = _urljoin(endpoint_url, r['url_path'], host_prefix)

in botocore.awsrequest line 252, where r['url_path'] contains /skichic-bucket?list-type=2 .botocore.awsrequest第 252 行,其中r['url_path']包含/skichic-bucket?list-type=2 So from here, I should be able to easily patch the library core to make it work for me.所以从这里开始,我应该能够轻松地修补库核心以使其对我有用。

Plus, the Prefix argument is not working, whatever I pass into it I always receive the whole bucket content, but I guess I can easily patch this too.另外, Prefix参数不起作用,无论我传递给它什么,我总是收到整个存储桶内容,但我想我也可以轻松地修补它。

Now it's not satisfying, since there's no issue related to this on github, I can't believe that the library contains such a bug that I'm the first one to encounter.现在它并不令人满意,因为github上没有与此相关的问题,我无法相信该库包含我是第一个遇到的错误。

Does anyone can explain this whole mess ?有没有人可以解释这整个烂摊子? >.< >.<

you can try this.你可以试试这个。 you'll have to use your resource instead of my s3sr.你必须使用你的资源而不是我的 s3sr。

s3sr = resource('s3')
bucket = 'your-bucket'
prefix = 'your-prefix/' # if no prefix, pass ''

def get_keys_from_prefix(bucket, prefix):
    '''gets list of keys for given bucket and prefix'''
    keys_list = []
    paginator = s3sr.meta.client.get_paginator('list_objects_v2')
    # use Delimiter to limit search to that level of hierarchy
    for page in paginator.paginate(Bucket=bucket, Prefix=prefix, Delimiter='/'):
        keys = [content['Key'] for content in page.get('Contents')]
        print('keys in page: ', len(keys))
        keys_list.extend(keys)
    return keys_list

keys_list = get_keys_from_prefix(bucket, prefix)

After looking more closely into things, I've found out that (a lot) of botocore services endpoints patterns starts with the bucket name.在更仔细地研究之后,我发现(很多) botocore服务端点模式以存储桶名称开头。 For example, here's the definition of the list_objects_v2 service:例如,这里是list_objects_v2服务的定义:

"ListObjectsV2":{
      "name":"ListObjectsV2",
      "http":{
        "method":"GET",
        "requestUri":"/{Bucket}?list-type=2"
      },

My guess is that in the standard implementation of AWS S3, there's a generic endpoint_url (which explains @jordanm comment) and the targeted bucket is reached through the endpoint.我的猜测是,在 AWS S3 的标准实现中,有一个通用的endpoint_url (它解释了@jordanm注释)并且目标存储桶是通过端点到达的。

Now, in the case of Scaleway, there's an endpoint_url for each bucket, with the bucket name contained in that url (eg https://<bucket_name>.s3.<region>.<provider> ), and any endpoint should directly starts with a bucket Key .现在,在 Scaleway 的情况下,每个存储桶都有一个endpoint_url ,该 url 中包含存储桶名称(例如https://<bucket_name>.s3.<region>.<provider> ),任何端点都应该直接启动用桶Key

I made a fork of botocore where I rewrote every endpoint to remove the bucket name, if that can help someone in the future.我做了一个botocore ,我重写了每个端点去除斗名,如果这能帮助别人的未来。

Thank's again to all contributors !再次感谢所有贡献者!

For those who are facing the same issue, try changing your endpoint_url parameter in your boto3 client or resource instantiation from https://<bucket_name>.s3.<region>.<provider> to https://s3.<region>.<provider> ;对于那些面临相同问题的人,请尝试将 boto3 客户端或资源实例中的endpoint_url参数从https://<bucket_name>.s3.<region>.<provider>更改为https://s3.<region>.<provider> ; ie for Scaleway : https://s3.<region>.scw.cloud .对于 Scaleway : https://s3.<region>.scw.cloud

You can then set the Bucket parameter to select the bucket you want.然后您可以设置Bucket参数以选择您想要的存储桶。

list_objects_v2(Bucket=<bucket_name>)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM