简体   繁体   中英

Fastest way to get the files count and total size of a folder in GCS?

Assume there is bucket with a folder root, it has subfolders and files. Is there any way to get the total files count and total size of the root folder?

What I tried: With gsutil du I'm getting the size quickly but won't the get count. With gsutil ls ___ I'm getting list and size, if I pipe it with awk and sum them. I might get the expected result but ls itself is taking lot of time.

So is there a better/faster way to handle this?

If you want to track the count of objects in a bucket over a long time, Cloud Monitoring offers themetric "storage/object_count". The metric updates about once per day, which makes it more useful for long-term trends.

As for counting instantaneously, unfortunately gsutil ls is probably your best bet.

Doing an object listing of some sort is the way to go - both the ls and du commands in gsutil perform object listing API calls under the hood.

If you want to get a summary of all objects in a bucket, check Cloud Monitoring ( as mentioned in the docs ). But, this isn't applicable if you want statistics for a subset of objects - GCS doesn't support actual "folders", so all your objects under the "folder" foo are actually just objects named with a common prefix, foo/ .

If you want to analyze the number of objects under a given prefix, you'll need to perform object listing API calls (either using a client library or using gsutil). The listing operations can only return so many objects per response and thus are paginated, meaning you'll have to make several calls if you have lots of objects under the desired prefix. The max number of results per listing call is currently 1,000. So as an example, if you had 200,000 objects to list, you'd have to make 200 sequential API calls.

A note on gsutil's ls :

There are several scenarios in which gsutil can do "extra" work when completing an ls command, like when doing a "long" listing using the -L flag or performing recursive listings using the -r flag. To save time and perform the fewest number of listings possible in order to obtain a total count of bytes under some prefix, you'll want to do a "flat" listing using gsutil's wildcard support, eg:

gsutil ls -l gs://my-bucket/some-prefix/**

Alternatively, you could try writing a script using one of the GCS client libraries, like the Python library and its list_blobs functionality.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM