简体   繁体   English

Google Cloud Storage + Python:有什么办法可以在 GCS 的某个文件夹中列出 obj?

[英]Google Cloud Storage + Python : Any way to list obj in certain folder in GCS?

I'm going to write a Python program to check if a file is in certain folder of my Google Cloud Storage, the basic idea is to get the list of all objects in a folder, a file name list , then check if the file abc.txt is in the file name list .我将编写一个 Python 程序来检查文件是否在我的 Google Cloud Storage 的某个文件夹中,基本思想是获取文件夹中所有对象的list ,一个文件名list ,然后检查文件abc.txt在文件名list

Now the problem is, it looks Google only provide the one way to get obj list , which is uri.get_bucket() , see below code which is from https://developers.google.com/storage/docs/gspythonlibrary#listing-objects现在的问题是,看起来 Google 只提供了一种获取obj list ,即uri.get_bucket() ,请参阅以下来自https://developers.google.com/storage/docs/gspythonlibrary#listing- 的代码对象

uri = boto.storage_uri(DOGS_BUCKET, GOOGLE_STORAGE)
for obj in uri.get_bucket():
    print '%s://%s/%s' % (uri.scheme, uri.bucket_name, obj.name)
    print '  "%s"' % obj.get_contents_as_string()

The defect of uri.get_bucket() is, it looks it is getting all of the object first, this is what I don't want, I just need get the obj name list of particular folder(eg gs//mybucket/abc/myfolder ) , which should be much quickly. uri.get_bucket()的缺陷是,它看起来是先获取所有对象,这是我不想要的,我只需要获取特定文件夹的obj名称list (例如gs//mybucket/abc/myfolder ) ,这应该很快。

Could someone help answer?有人可以帮忙解答吗? Appreciate every answer!欣赏每一个答案!

Update : the below is true for the older "Google API Client Libraries" for Python, but if you're not using that client, prefer the newer "Google Cloud Client Library" for Python ( https://googleapis.dev/python/storage/latest/index.html ).更新:以下适用于 Python 的较旧“Google API 客户端库”,但如果您不使用该客户端,则更喜欢用于 Python 的较新“Google Cloud 客户端库”( https://googleapis.dev/python/存储/最新/index.html )。 For the newer library, the equivalent to the below code is:对于较新的库,相当于以下代码:

from google.cloud import storage

client = storage.Client()
for blob in client.list_blobs('bucketname', prefix='abc/myfolder'):
  print(str(blob))

Answer for older client follows.老客户的回答如下。

You may find it easier to work with the JSON API, which has a full-featured Python client.您可能会发现使用具有全功能 Python 客户端的 JSON API 更容易。 It has a function for listing objects that takes a prefix parameter, which you could use to check for a certain directory and its children in this manner:它有一个用于列出带有前缀参数的对象的函数,您可以使用它以这种方式检查某个目录及其子目录:

from apiclient import discovery

# Auth goes here if necessary. Create authorized http object...
client = discovery.build('storage', 'v1') # add http=whatever param if auth
request = client.objects().list(
    bucket="mybucket",
    prefix="abc/myfolder")
while request is not None:
  response = request.execute()
  print json.dumps(response, indent=2)
  request = request.list_next(request, response)

Fuller documentation of the list call is here: https://developers.google.com/storage/docs/json_api/v1/objects/list列表调用的完整文档在这里: https : //developers.google.com/storage/docs/json_api/v1/objects/list

And the Google Python API client is documented here: https://code.google.com/p/google-api-python-client/此处记录了 Google Python API 客户端: https : //code.google.com/p/google-api-python-client/

This worked for me:这对我有用:

client = storage.Client()
BUCKET_NAME = 'DEMO_BUCKET'
bucket = client.get_bucket(BUCKET_NAME)

blobs = bucket.list_blobs()

for blob in blobs:
    print(blob.name)

The list_blobs() method will return an iterator used to find blobs in the bucket. list_blobs() 方法将返回一个迭代器,用于在存储桶中查找 blob。 Now you can iterate over blobs and access every object in the bucket.现在,您可以遍历 Blob 并访问存储桶中的每个对象。 In this example I just print out the name of the object.在这个例子中,我只是打印出对象的名称。

This documentation helped me alot:该文档对我有很大帮助:

I hope I could help!我希望我能帮上忙!

You might also want to look at gcloud-python and documentation .您可能还想查看gcloud-python文档

from gcloud import storage
connection = storage.get_connection(project_name, email, private_key_path)
bucket = connection.get_bucket('my-bucket')

for key in bucket:
  if key.name == 'abc.txt':
    print 'Found it!'
    break

However, you might be better off just checking if the file exists:但是,您最好只检查文件是否存在:

if 'abc.txt' in bucket:
  print 'Found it!'

Install python package google-cloud-storage by pip or pycharm and use below code通过 pip 或 pycharm 安装 python 包 google-cloud-storage 并使用以下代码

from google.cloud import storage
client = storage.Client()
for blob in client.list_blobs(BUCKET_NAME, prefix=FOLDER_NAME):
  print(str(blob))

I know this is an old question, but I stumbled over this because I was looking for the exact same answer.我知道这是一个老问题,但我偶然发现了这个,因为我正在寻找完全相同的答案。 Answers from Brandon Yarbrough and Abhijit worked for me, but I wanted to get into more detail. Brandon Yarbrough 和 Abhijit 的回答对我有用,但我想了解更多细节。

When you run this:当你运行这个:

from google.cloud import storage
storage_client = storage.Client()
blobs = list(storage_client.list_blobs(bucket_name, prefix=PREFIX, fields="items(name)"))

You will get Blob objects, with just the name field of all files in the given bucket, like this:您将获得 Blob 对象,其中只有给定存储桶中所有文件的名称字段,如下所示:

[<Blob: BUCKET_NAME, PREFIX, None>, 
 <Blob: xml-BUCKET_NAME, [PREFIX]claim_757325.json, None>, 
 <Blob: xml-BUCKET_NAME, [PREFIX]claim_757390.json, None>,
 ...]

If you are like me and you want to 1) filter out the first item in the list because it does NOT represent a file - its just the prefix, 2) just get the name string value, and 3) remove the PREFIX from the file name, you can do something like this:如果您像我一样并且想要 1) 过滤掉列表中的第一项,因为它不代表文件 - 它只是前缀,2) 只获取名称字符串值,以及 3) 从文件中删除前缀名称,您可以执行以下操作:

blob_names = [blob_name.name[len(PREFIX):] for blob_name in blobs if blob_name.name != folder_name]

Complete code to get just the string files names from a storage bucket:从存储桶中获取字符串文件名的完整代码:

from google.cloud import storage
storage_client = storage.Client()
blobs = list(storage_client.list_blobs(bucket_name, prefix=PREFIX, fields="items(name)"))
blob_names = [blob_name.name[len(PREFIX):] for blob_name in blobs if blob_name.name != folder_name]
print(f"blob_names = {blob_names}")

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 GCS - 从 Google Cloud Storage 直接读取文本文件到 python - GCS - Read a text file from Google Cloud Storage directly into python 使用 Python 将文件夹上传到 Google Cloud Storage? - Upload a folder to Google Cloud Storage with Python? 对于 Python,“import gcsfs”或“from google.cloud import storage”与 GCS 交互 - For Python, “import gcsfs” or “from google.cloud import storage” to interact with GCS 使用Python列出Google Cloud Storage存储桶 - List Google Cloud Storage buckets using Python 如何在 GAE 上的 Django 中为 GCS 使用 django-google-cloud-storage - How to use django-google-cloud-storage for GCS in Django on GAE 使用google-cloud-storage将数据从gcs传输到s3 - Transfering data from gcs to s3 with google-cloud-storage 如何使用 Python API 在 Google Cloud Storage 上上传文件夹 - How to upload folder on Google Cloud Storage using Python API python和谷歌云存储 - python and google cloud storage Google Cloud Storage Python list_blob()不打印对象列表 - Google Cloud Storage Python list_blob() not printing object list Cloud Function 将存储桶中的所有文件复制到同一 GCS 存储桶内的文件夹中 - Cloud Function to copy all the files from a Storage bucket to a folder inside the same GCS storage bucket
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM