[英]Google Cloud Storage + Python : Any way to list obj in certain folder in GCS?
I'm going to write a Python program to check if a file is in certain folder of my Google Cloud Storage, the basic idea is to get the list
of all objects in a folder, a file name list
, then check if the file abc.txt
is in the file name list
.我将编写一个 Python 程序来检查文件是否在我的 Google Cloud Storage 的某个文件夹中,基本思想是获取文件夹中所有对象的list
,一个文件名list
,然后检查文件abc.txt
在文件名list
。
Now the problem is, it looks Google only provide the one way to get obj
list
, which is uri.get_bucket()
, see below code which is from https://developers.google.com/storage/docs/gspythonlibrary#listing-objects现在的问题是,看起来 Google 只提供了一种获取obj
list
,即uri.get_bucket()
,请参阅以下来自https://developers.google.com/storage/docs/gspythonlibrary#listing- 的代码对象
uri = boto.storage_uri(DOGS_BUCKET, GOOGLE_STORAGE)
for obj in uri.get_bucket():
print '%s://%s/%s' % (uri.scheme, uri.bucket_name, obj.name)
print ' "%s"' % obj.get_contents_as_string()
The defect of uri.get_bucket()
is, it looks it is getting all of the object first, this is what I don't want, I just need get the obj
name list
of particular folder(eg gs//mybucket/abc/myfolder
) , which should be much quickly. uri.get_bucket()
的缺陷是,它看起来是先获取所有对象,这是我不想要的,我只需要获取特定文件夹的obj
名称list
(例如gs//mybucket/abc/myfolder
) ,这应该很快。
Could someone help answer?有人可以帮忙解答吗? Appreciate every answer!欣赏每一个答案!
Update : the below is true for the older "Google API Client Libraries" for Python, but if you're not using that client, prefer the newer "Google Cloud Client Library" for Python ( https://googleapis.dev/python/storage/latest/index.html ).更新:以下适用于 Python 的较旧“Google API 客户端库”,但如果您不使用该客户端,则更喜欢用于 Python 的较新“Google Cloud 客户端库”( https://googleapis.dev/python/存储/最新/index.html )。 For the newer library, the equivalent to the below code is:对于较新的库,相当于以下代码:
from google.cloud import storage
client = storage.Client()
for blob in client.list_blobs('bucketname', prefix='abc/myfolder'):
print(str(blob))
Answer for older client follows.老客户的回答如下。
You may find it easier to work with the JSON API, which has a full-featured Python client.您可能会发现使用具有全功能 Python 客户端的 JSON API 更容易。 It has a function for listing objects that takes a prefix parameter, which you could use to check for a certain directory and its children in this manner:它有一个用于列出带有前缀参数的对象的函数,您可以使用它以这种方式检查某个目录及其子目录:
from apiclient import discovery
# Auth goes here if necessary. Create authorized http object...
client = discovery.build('storage', 'v1') # add http=whatever param if auth
request = client.objects().list(
bucket="mybucket",
prefix="abc/myfolder")
while request is not None:
response = request.execute()
print json.dumps(response, indent=2)
request = request.list_next(request, response)
Fuller documentation of the list call is here: https://developers.google.com/storage/docs/json_api/v1/objects/list列表调用的完整文档在这里: https : //developers.google.com/storage/docs/json_api/v1/objects/list
And the Google Python API client is documented here: https://code.google.com/p/google-api-python-client/此处记录了 Google Python API 客户端: https : //code.google.com/p/google-api-python-client/
This worked for me:这对我有用:
client = storage.Client()
BUCKET_NAME = 'DEMO_BUCKET'
bucket = client.get_bucket(BUCKET_NAME)
blobs = bucket.list_blobs()
for blob in blobs:
print(blob.name)
The list_blobs() method will return an iterator used to find blobs in the bucket. list_blobs() 方法将返回一个迭代器,用于在存储桶中查找 blob。 Now you can iterate over blobs and access every object in the bucket.现在,您可以遍历 Blob 并访问存储桶中的每个对象。 In this example I just print out the name of the object.在这个例子中,我只是打印出对象的名称。
This documentation helped me alot:该文档对我有很大帮助:
https://googleapis.github.io/google-cloud-python/latest/storage/blobs.html https://googleapis.github.io/google-cloud-python/latest/storage/blobs.html
https://googleapis.github.io/google-cloud-python/latest/_modules/google/cloud/storage/client.html#Client.bucket https://googleapis.github.io/google-cloud-python/latest/_modules/google/cloud/storage/client.html#Client.bucket
I hope I could help!我希望我能帮上忙!
You might also want to look at gcloud-python and documentation .您可能还想查看gcloud-python和文档。
from gcloud import storage
connection = storage.get_connection(project_name, email, private_key_path)
bucket = connection.get_bucket('my-bucket')
for key in bucket:
if key.name == 'abc.txt':
print 'Found it!'
break
However, you might be better off just checking if the file exists:但是,您最好只检查文件是否存在:
if 'abc.txt' in bucket:
print 'Found it!'
Install python package google-cloud-storage by pip or pycharm and use below code通过 pip 或 pycharm 安装 python 包 google-cloud-storage 并使用以下代码
from google.cloud import storage
client = storage.Client()
for blob in client.list_blobs(BUCKET_NAME, prefix=FOLDER_NAME):
print(str(blob))
I know this is an old question, but I stumbled over this because I was looking for the exact same answer.我知道这是一个老问题,但我偶然发现了这个,因为我正在寻找完全相同的答案。 Answers from Brandon Yarbrough and Abhijit worked for me, but I wanted to get into more detail. Brandon Yarbrough 和 Abhijit 的回答对我有用,但我想了解更多细节。
When you run this:当你运行这个:
from google.cloud import storage
storage_client = storage.Client()
blobs = list(storage_client.list_blobs(bucket_name, prefix=PREFIX, fields="items(name)"))
You will get Blob objects, with just the name field of all files in the given bucket, like this:您将获得 Blob 对象,其中只有给定存储桶中所有文件的名称字段,如下所示:
[<Blob: BUCKET_NAME, PREFIX, None>,
<Blob: xml-BUCKET_NAME, [PREFIX]claim_757325.json, None>,
<Blob: xml-BUCKET_NAME, [PREFIX]claim_757390.json, None>,
...]
If you are like me and you want to 1) filter out the first item in the list because it does NOT represent a file - its just the prefix, 2) just get the name string value, and 3) remove the PREFIX from the file name, you can do something like this:如果您像我一样并且想要 1) 过滤掉列表中的第一项,因为它不代表文件 - 它只是前缀,2) 只获取名称字符串值,以及 3) 从文件中删除前缀名称,您可以执行以下操作:
blob_names = [blob_name.name[len(PREFIX):] for blob_name in blobs if blob_name.name != folder_name]
Complete code to get just the string files names from a storage bucket:从存储桶中获取字符串文件名的完整代码:
from google.cloud import storage
storage_client = storage.Client()
blobs = list(storage_client.list_blobs(bucket_name, prefix=PREFIX, fields="items(name)"))
blob_names = [blob_name.name[len(PREFIX):] for blob_name in blobs if blob_name.name != folder_name]
print(f"blob_names = {blob_names}")
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.