简体   繁体   English

在 GCS 中获取文件数和文件夹总大小的最快方法?

[英]Fastest way to get the files count and total size of a folder in GCS?

Assume there is bucket with a folder root, it has subfolders and files.假设有一个文件夹根目录存储桶,它有子文件夹和文件。 Is there any way to get the total files count and total size of the root folder?有没有办法获得根文件夹的总文件数和总大小?

What I tried: With gsutil du I'm getting the size quickly but won't the get count.我尝试过的:使用gsutil du我可以快速获取大小,但不会计数。 With gsutil ls ___ I'm getting list and size, if I pipe it with awk and sum them.使用gsutil ls ___我得到列表和大小,如果我用 awk 管道它并总结它们。 I might get the expected result but ls itself is taking lot of time.我可能会得到预期的结果,但 ls 本身需要很多时间。

So is there a better/faster way to handle this?那么有没有更好/更快的方法来处理这个问题?

If you want to track the count of objects in a bucket over a long time, Cloud Monitoring offers themetric "storage/object_count".如果您想长时间跟踪存储桶中的对象计数, Cloud Monitoring提供了指标“storage/object_count”。 The metric updates about once per day, which makes it more useful for long-term trends.该指标大约每天更新一次,这使得它对长期趋势更有用。

As for counting instantaneously, unfortunately gsutil ls is probably your best bet.至于即时计数,不幸的是gsutil ls可能是您最好的选择。

Doing an object listing of some sort is the way to go - both the ls and du commands in gsutil perform object listing API calls under the hood.进行某种对象列表是一种可行的方法 - gsutil 中的lsdu命令都在幕后执行对象列表 API 调用。

If you want to get a summary of all objects in a bucket, check Cloud Monitoring ( as mentioned in the docs ).如果您想获取存储桶中所有对象的摘要,请检查 Cloud Monitoring( 如文档中所述)。 But, this isn't applicable if you want statistics for a subset of objects - GCS doesn't support actual "folders", so all your objects under the "folder" foo are actually just objects named with a common prefix, foo/ .但是,如果您想要对象子集的统计信息,这不适用 - GCS 不支持实际的“文件夹”,因此“文件夹” foo下的所有对象实际上只是使用通用前缀foo/命名的对象。

If you want to analyze the number of objects under a given prefix, you'll need to perform object listing API calls (either using a client library or using gsutil).如果要分析给定前缀下的对象数量,则需要执行对象列表 API 调用(使用客户端库或使用 gsutil)。 The listing operations can only return so many objects per response and thus are paginated, meaning you'll have to make several calls if you have lots of objects under the desired prefix.列出操作每个响应只能返回这么多对象,因此是分页的,这意味着如果您在所需前缀下有很多对象,则必须进行多次调用。 The max number of results per listing call is currently 1,000.每个列表调用的最大结果数目前为 1,000。 So as an example, if you had 200,000 objects to list, you'd have to make 200 sequential API calls.例如,如果您要列出 200,000 个对象,则必须进行 200 个连续的 API 调用。

A note on gsutil's ls :关于 gsutil 的ls

There are several scenarios in which gsutil can do "extra" work when completing an ls command, like when doing a "long" listing using the -L flag or performing recursive listings using the -r flag.在完成ls命令时,gsutil 可以在多种情况下执行“额外”工作,例如使用-L标志执行“长”列表或使用-r标志执行递归列表时。 To save time and perform the fewest number of listings possible in order to obtain a total count of bytes under some prefix, you'll want to do a "flat" listing using gsutil's wildcard support, eg:为了节省时间并尽可能少地执行列表以获取某个前缀下的总字节数,您需要使用 gsutil 的通配符支持进行“平面”列表,例如:

gsutil ls -l gs://my-bucket/some-prefix/**

Alternatively, you could try writing a script using one of the GCS client libraries, like the Python library and its list_blobs functionality.或者,您可以尝试使用 GCS 客户端库之一编写脚本,例如Python 库及其list_blobs功能。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何仅列出GCS中的文件而不是文件夹? - How to list only files not folder in GCS? Mongoosejs:最简单的方法来选择带有limit和where的行,然后得到总数 - Mongoosejs: Cleanest way to select rows with `limit` and `where`, then get count of total GridFS集合:如何获取所有上载文件的总大小? - GridFS Collection: howto get total size of all uploaded Files? 单元测试-测试文件计数和文件夹大小 - Unit Testing - test for file count and folder size 如何使用nodejs在elasticsearch中获取索引的总数或总数 - How to get total count or total items of an index in elasticsearch using nodejs MongoDB 使用 $search 获取总计数聚合管道 - MongoDB get total count aggregation pipeline with $search nodejs cosmosdb 获取表中的总记录数 - nodejs cosmosdb get count of total records in a table 将多个文件发送到Node中的GCS - Sending multiple files to GCS in Node nodejs通过inode打开nfs文件(或者是重新打开文件的最快方法) - nodejs open nfs files by inode (or a the fastest way to reopen a file) 使用 Nodejs 读取和写入很小但很多文件的最快方法是什么? - What is the fastest way to read & write tiny but MANY files using Nodejs?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM