简体   繁体   English

汇总测量某些文件类型的磁盘空间

[英]Measure disk space of certain file types in aggregate

I have some files across several folders:我在几个文件夹中有一些文件:

/home/d/folder1/a.txt
/home/d/folder1/b.txt
/home/d/folder1/c.mov
/home/d/folder2/a.txt
/home/d/folder2/d.mov
/home/d/folder2/folder3/f.txt

How can I measure the grand total amount of disk space taken up by all the .txt files in /home/d/?如何测量 /home/d/ 中所有 .txt 文件占用的磁盘空间总量?

I know du will give me the total space of a given folder , and ls -l will give me the total space of individual files , but what if I want to add up all the txt files and just look at the space taken by all .txt files in one giant total for all .txt in /home/d/ including both folder1 and folder2 and their subfolders like folder3?我知道du 会给我一个给定文件夹的总空间,而ls -l 会给我单个文件的总空间,但是如果我想把所有的 txt 文件加起来,看看all 占用的空间怎么办对于 /home/d/ 中的所有 .txt文件,包括文件夹 1 和文件夹 2 及其子文件夹(如文件夹 3)。

find folder1 folder2 -iname '*.txt' -print0 | du --files0-from - -c -s | tail -1

This will report disk space usage in bytes by extension:这将按扩展名以字节为单位报告磁盘空间使用情况:

find . -type f -printf "%f %s\n" |
  awk '{
      PARTSCOUNT=split( $1, FILEPARTS, "." );
      EXTENSION=PARTSCOUNT == 1 ? "NULL" : FILEPARTS[PARTSCOUNT];
      FILETYPE_MAP[EXTENSION]+=$2
    }
   END {
     for( FILETYPE in FILETYPE_MAP ) {
       print FILETYPE_MAP[FILETYPE], FILETYPE;
      }
   }' | sort -n

Output:输出:

3250 png
30334451 mov
57725092729 m4a
69460813270 3gp
79456825676 mp3
131208301755 mp4

Simple:简单的:

du -ch *.txt

If you just want the total space taken to show up, then:如果您只想显示占用的总空间,则:

du -ch *.txt | tail -1

Here's a way to do it (in Linux, using GNU coreutils du and Bash syntax), avoiding bad practice :这是一种方法(在 Linux 中,使用 GNU coreutils du和 Bash 语法),避免不良做法

total=0
while read -r line
do
    size=($line)
    (( total+=size ))
done < <( find . -iname "*.txt" -exec du -b {} + )
echo "$total"

If you want to exclude the current directory, use -mindepth 2 with find .如果要排除当前目录,请将-mindepth 2find

Another version that doesn't require Bash syntax:另一个不需要 Bash 语法的版本:

find . -iname "*.txt" -exec du -b {} + | awk '{total += $1} END {print total}'

Note that these won't work properly with file names which include newlines (but those with spaces will work).请注意,这些对于包含换行符的文件名将无法正常工作(但带有空格的文件名将起作用)。

This will do it:这将做到:

total=0
for file in *.txt
do
    space=$(ls -l "$file" | awk '{print $5}')
    let total+=space
done
echo $total

macOS苹果系统

  • use the tool du and the parameter -I to exclude all other files使用工具du和参数-I排除所有其他文件

Linux Linux

-X, --exclude-from=FILE
              exclude files that match any pattern in FILE

--exclude=PATTERN
              exclude files that match PATTERN

GNU 发现,

find /home/d -type f -name "*.txt" -printf "%s\n" | awk '{s+=$0}END{print "total: "s" bytes"}'

A one liner for those with GNU tools on bash:为那些在 bash 上使用 GNU 工具的人准备的单行:

for i in $(find . -type f | perl -ne 'print $1 if m/\.([^.\/]+)$/' | sort -u); do echo "$i"": ""$(du -hac **/*."$i" | tail -n1 | awk '{print $1;}')"; done | sort -h -k 2 -r

You must have extglob enabled:您必须启用 extglob:

shopt -s extglob

If you want dot files to work, you must run如果你想让点文件工作,你必须运行

shopt -s dotglob

Sample output:示例输出:

d: 3.0G
swp: 1.3G
mp4: 626M
txt: 263M
pdf: 238M
ogv: 115M
i: 76M
pkl: 65M
pptx: 56M
mat: 50M
png: 29M
eps: 25M

etc等等

Building on ennuikiller's, this will handle spaces in names.建立在 ennukiller 的基础上,这将处理名称中的空格。 I needed to do this and get a little report:我需要这样做并得到一份小报告:

find -type f -name "*.wav" | find -type f -name "*.wav" | grep export | grep 出口 | ./calc_space ./calc_space

#!/bin/bash
# calc_space
echo SPACE USED IN MEGABYTES
echo
total=0
while read FILE
do
    du -m "$FILE"
    space=$(du -m "$FILE"| awk '{print $1}')
    let total+=space
done
echo $total

I like to use find in combination with xargs:我喜欢将 find 与 xargs 结合使用:

find . -name "*.txt" -print0 |xargs -0 du -ch

Add tail if you only want to see the grand total如果您只想查看总计,请添加尾部

find . -name "*.txt" -print0 |xargs -0 du -ch | tail -n1

我的解决方案是获取给定路径和子目录中所有文本文件的总大小(使用 perl oneliner)

find /path -iname '*.txt' | perl -lane '$sum += -s $_; END {print $sum}'

For anyone wanting to do this with macOS at the command line, you need a variation based on the -print0 argument instead of printf.对于任何想在命令行中使用 macOS 执行此操作的人,您需要基于 -print0 参数而不是 printf 的变体。 Some of the above answers address that but this will do it comprehensively by extension:上面的一些答案解决了这个问题,但这将通过扩展来全面实现:

    find . -type f -print0 | xargs -0 stat -f "%N %i" |
  awk '{
      PARTSCOUNT=split( $1, FILEPARTS, "." );
      EXTENSION=PARTSCOUNT == 1 ? "NULL" : FILEPARTS[PARTSCOUNT];
      FILETYPE_MAP[EXTENSION]+=$2
    }
   END {
     for( FILETYPE in FILETYPE_MAP ) {
       print FILETYPE_MAP[FILETYPE], FILETYPE;
      }
   }' | sort -n

There are several potential problems with the accepted answer:接受的答案有几个潜在的问题:

  1. it does not descend into subdirectories (without relying on non-standard shell features like globstar )它不会进入子目录(不依赖于非标准的 shell 功能,如globstar
  2. in general, as pointed out by Dennis Williamson below, you should avoid parsing the output of ls一般来说,正如丹尼斯威廉姆森在下面指出的那样,您应该避免解析ls的输出
    • namely, if the user or group (columns 3 and 4) have spaces in them, column 5 will not be the file size即,如果用户或组(第 3 列和第 4 列)中有空格,则第 5 列将不是文件大小
  3. if you have a million such files, this will spawn two million subshells, and it'll be sloooow如果你有 100 万个这样的文件,这将产生200万个子外壳,而且会很慢

As proposed by ghostdog74 , you can use the GNU-specific -printf option to find to achieve a more robust solution, avoiding all the excessive pipes, subshells, Perl, and weird du options:正如ghostdog74提议的,您可以使用特定于 GNU 的-printf选项来find以实现更强大的解决方案,避免所有过多的管道、子shell、Perl 和奇怪的du选项:

# the '%s' format string means "the file's size"
find . -name "*.txt" -printf "%s\n" \
  | awk '{sum += $1} END{print sum " bytes"}'

Yes, yes, solutions using paste or bc are also possible, but not any more straightforward.是的,是的,使用pastebc解决方案也是可能的,但不是更简单。

On macOS, you would need to use Homebrew or MacPorts to install findutils , and call gfind instead.在 macOS 上,您需要使用 Homebrew 或 MacPorts 来安装findutils ,并调用gfind代替。 (I see the "linux" tag on this question, but it's also tagged "unix".) (我在这个问题上看到了“linux”标签,但它也被标记为“unix”。)

Without GNU find , you can still fall back to using du :如果没有 GNU find ,您仍然可以使用du

find . -name "*.txt" -exec du -k {} + \
  | awk '{kbytes+=$1} END{print kbytes " Kbytes"}'

…but you have to be mindful of the fact that du 's default output is in 512-byte blocks for historical reasons (see the "RATIONALE" section of the man page), and some versions of du (notably, macOS's) will not even have an option to print sizes in bytes. …但你必须注意,由于历史原因, du的默认输出是512 字节的块(参见手册页的“基本原理”部分),并且du某些版本(特别是 macOS 的)不会甚至可以选择以字节为单位打印大小。

Many other fine solutions here (see Barn's answer in particular), but most suffer the drawback of being unnecessarily complex or depending too heavily on GNU-only features—and maybe in your environment, that's OK!这里有许多其他很好的解决方案(特别是请参阅Barn 的回答),但大多数都存在不必要的复杂性或过于依赖仅 GNU 的功能的缺点——也许在您的环境中,没关系!

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM