简体   繁体   中英

Cycle through subdirectory in google cloud storage

I have a storage bucket in google cloud. I have a few directories which I created with files in them.

I know that if I want to cycle through all the files in one of the directories, I can use the following command:

for file in list(source_bucket.list_blobs(prefix='subdir/subdir2')):
    file_path=f"gs://{file.bucket.name}/{file.name}"
    print(file_path)

However, the result includes the actual path that I am trying to cycle through,

gs://bucket-name/subdir/subdir2 <----- this item
gs://bucket-name/subdir/subdir2/file1
gs://bucket-name/subdir/subdir2/file2
....

Is there a way to cycle through the directory without having the directory appear so that it looks like this.

gs://bucket-name/subdir/subdir2/file1
gs://bucket-name/subdir/subdir2/file2
....

I managed to do this:

subdir = 'subdir1/subdir2/'

for file in list(source_bucket.list_blobs(prefix=subdir)):
    file_path = f"gs://{file.bucket.name}/{file.name}"
    if file.name == subdir:
        continue
    else:
        print(file_path)

But is there a cleaner way to do it using the google storage api? I tried to look up the documentation but I don't see anything like that.

Cloud Storage does not actually have directories, it is a flat structure. The Console is just making it look like a hierarchical structure by naming the objects with a pattern similar to a file system. So when you request all the objects in a specific "folder" you are just requesting all objects that start with the same prefix, thus you are getting the entire "sub-hierarchy" as a result.

You can check https://cloud.google.com/storage/docs/naming-objects for more information. This is the relevant bit:

Object names reside in a flat namespace within a bucket. This means that:

 Different buckets can have objects with the same name. Objects do not reside within subdirectories in a bucket.

For example, you can name an object /europe/france/paris.jpg to make it appear that paris.jpg resides in the subdirectory /europe/france, but to Cloud Storage, the object simply exists in the bucket and has the name /europe/france/paris.jpg. As a result, while deeply nested, directory-like structures using slash delimiters are possible within Cloud Storage, they don't have the performance that a native filesystem has when listing deeply nested sub-directories.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM