简体   繁体   English

可以使用du获得正确的文件大小

[英]Could get right file size using du

I see that my bucket size on the aws s3 storage is 13.2GiB and it has 1570 files: 我看到我的AWS S3存储上的存储桶大小为13.2GiB,它具有1570个文件:

$ aws s3 ls --summarize --human-readable s3://mybucket/ | grep -E "(Total\sObjects|Total\sSize)"
Total Objects: 1570
   Total Size: 13.2 GiB

When I downloaded this bucket here is what I see: 当我下载此存储桶时,我看到的是:

$du -sh ./test
14G
$wc -l ./test
1570
$ du -sb ./test
14204477032
$ du -sb ./test | awk '{ \
            split( "B KB MB GB" , v ); \
            s=1; \
            while( $1>=1024 ) { \
                $1/=1024; s++ \
            } \
            printf "%.1f%s", $1, v[s] \
        }'
13.2GB

How to achieve the same result using standard Linux functions? 如何使用标准Linux功能获得相同的结果?

Thanks 谢谢

du is originally for finding out how much space a file occupies on the storage medium (disk). du最初用于确定文件在存储介质(磁盘)上占用多少空间。 That's the main reason why it rather rounds up than down. 这就是为什么舍入而不舍入的主要原因。 A started allocated block is always "used" completely, even if just two bytes of it are in use. 即使已使用了其中的两个字节,已启动的已分配块也始终被完全“使用”。

Your case rather seems to aim at counting the bytes in the files, regardless of the storage space they occupy. 您的案例似乎旨在计算文件中的字节数,而不管它们占用的存储空间如何。 For this, du has the option --apparent-size . 为此, du具有--apparent-size选项。 Rather than disk usage, it then displays the file's sizes. 然后,它会显示文件的大小,而不是磁盘的使用情况。 Combined with --block-size=1 this is simpler spelled as -b . --block-size=1结合使用, --block-size=1拼写更简单-b

Next thing is that you want to convert a large number like 14204477032 into a neat version like 13.2GB . 接下来的事情就是要大量的转换像14204477032成整齐的版本像13.2GB You also state in a comment that 14G (as -h would produce) isn't precise enough for your taste, and you also provide an awk script which does exactly this conversion so that you already have a working solution. 您还要在注释中指出14G (如-h可能产生的)不够精确,无法满足您的口味,并且提供了awk脚本来执行此转换,因此您已经有了一个可行的解决方案。

I'm not aware of any standard Unix tool other than awk or even more complex things like perl or python which would do this in a much easier fashion. 除了awk或什至更复杂的东西(如perlpython ,我不知道有任何其他标准的Unix工具,它们可以更轻松地完成此操作。 There are other people looking for a solution for this, and yours is among the best ones. 还有其他人正在寻找解决方案,而您的公司就是最好的。

SO my advice is just this: Stick with your solution. 所以我的建议就是:坚持您的解决方案。 The only improvement I'd propose would be to use bit-shifting ( >> 10 ) instead of division ( / 1024 ) but that's rather a matter of taste. 我建议的唯一改进将是使用位移( >> 10 )而不是除法( / 1024 ),但这只是一个品味问题。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM