简体   繁体   English

如何在 python 的帮助下检查 aws s3 路径是否存在?

[英]how to check if aws s3 path exists with the help of python?

I am trying to write python program that checks if path exists or not.我正在尝试编写检查路径是否存在的 python 程序。 For example, given the path /root/subfolder1/subfolder2/ , I want to pass this path to the S3 API to check whether this path exists in AWS S3 or not.例如,给定路径/root/subfolder1/subfolder2/ ,我想将此路径传递给 S3 API 以检查此路径是否存在于 AWS S3 中。

I have tried this, but it is not full-fledged solution for my requirement:我已经尝试过了,但这并不是满足我要求的完整解决方案:

import boto3
import botocore
client = boto3.client('s3',aws_access_key_id=AccessKey, aws_secret_access_key=SecretAccessKey,region_name='us-east-1')
result = client.list_objects(Bucket=full_poc", Prefix="sub_folder1/sub_folder2/full" )
print(result)
exist = False
if "Contents" in result:
    exist = True

print(exist)

With this code, even if I pass sub instead of sub_folder1 it prints True .使用此代码,即使我传递sub而不是sub_folder1它也会打印True

What are other ways to solve this problem?还有什么其他方法可以解决这个问题?

S3 doesn't have folders : S3 没有文件夹

In Amazon S3, buckets and objects are the primary resources, and objects are stored in buckets.在 Amazon S3 中,存储桶和对象是主要资源,对象存储在存储桶中。 Amazon S3 has a flat structure instead of a hierarchy like you would see in a file system. Amazon S3 具有平面结构,而不是您在文件系统中看到的层次结构。 However, for the sake of organizational simplicity, the Amazon S3 console supports the folder concept as a means of grouping objects.但是,为了组织简单,Amazon S3 控制台支持将文件夹概念作为对对象进行分组的一种方式。 Amazon S3 does this by using a shared name prefix for objects (that is, objects have names that begin with a common string). Amazon S3 通过为对象使用共享名称前缀(即,对象的名称以通用字符串开头)来实现这一点。 Object names are also referred to as key names . Object 名称也称为键名

The only way that /root/subfolder1/subfolder2/ can "exist" is if you have an object whose key begins with /root/subfolder1/subfolder2/ . /root/subfolder1/subfolder2/可以“存在”的唯一方法是,如果您有一个 object ,其密钥以/root/subfolder1/subfolder2/开头。 List the objects in your bucket and see if any begin with that prefix, eg something like列出存储桶中的对象并查看是否有任何以该前缀开头的对象,例如

any((s.startswith("/root/subfolder1/subfolder2/") for s in bucket.objects.all()))

No such thing called Folder in S3. S3 中没有所谓的文件夹。 Folder is basically an empty file with name ending with '/'.文件夹基本上是一个名称以“/”结尾的空文件。 We can check two things我们可以检查两件事

  • getObject results in empty body getObject 导致正文为空
  • Make sure name of key ends with / before getObject.确保键名在 getObject 之前以/结尾。 Reason for this check is, we don't want to get the actual object unless we know its a folder name, it will result in unnecessary data transfer.进行此检查的原因是,除非我们知道它的文件夹名称,否则我们不想获得实际的 object,这将导致不必要的数据传输。

If object doesn't exist getObject will result in error, we can just catch it.如果 object 不存在 getObject 将导致错误,我们可以抓住它。

s3 = boto3.client('s3')
key = 'myfolder1/subfolder/'
try:
    if(key.endswith('/')):
        obj = s3.get_object(Bucket='my-bucket',
                            Key=key)
        if(len(obj.get('Body').read()) == 0):
            folder = True
    else:
        folder = False
except Exception as e:
    folder = False
if(folder):
    print("yes its a folder")
else:
    print("No Its not")
import os
import tensorflow as tf
os.environ['AWS_REGION'] = 'us-west-2'
os.environ['S3_ENDPOINT'] = 's3-us-west-2.amazonaws.com'
print(tf.gfile.Exists('s3path'))#返回True or False

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM