简体   繁体   English

为什么wc实用程序会生成多行“total”?

[英]Why does the wc utility generate multiple lines with “total”?

I am using the wc utility in a shell script that I run from Cygwin, and I noticed that there is more than one line with "total" in its output. 我在从Cygwin运行的shell脚本中使用wc实用程序,我注意到输出中有多行“total”。

The following function is used to count the number of lines in my source files: 以下函数用于计算源文件中的行数:

count_curdir_src() {
    find . '(' -name '*.vb' -o -name '*.cs' ')' \
        -a '!' -iname '*.Designer.*' -a '!' -iname '.svn' -print0 | \
    xargs -0 wc -l
}

But its output for a certain directory looks like this: 但它对某个目录的输出如下所示:

$ find . '(' -name '*.vb' -o -name '*.cs' ')' -a '!' -iname '*.Designer.*' -a '!' -iname '.svn' -print0 | xargs -0 wc -l
     19 ./dirA/fileABC.cs
    640 ./dirA/subdir1/fileDEF.cs
    507 ./dirA/subdir1/fileGHI.cs
   2596 ./dirA/subdir1/fileJKL.cs
(...many others...)
     58 ./dirB/fileMNO.cs
     36 ./dirB/subdir1/filePQR.cs
 122200 total
  6022 ./dirB/subdir2/subsubdir/fileSTU.cs
    24 ./dirC/fileVWX.cs
(...)
    36 ./dirZ/Properties/AssemblyInfo.cs
    88 ./dirZ/fileYZ.cs
 25236 total

It looks like wc resets somewhere in the process. 看起来wc会在过程中的某个位置重置。 It cannot be caused by space characters in filenames or directory names, because I use the -print0 option. 它不能由文件名或目录名中的空格字符引起,因为我使用-print0选项。 And it only happens when I run it on my largest source tree. 它只发生在我最大的源树上运行时。

So, is this a bug in wc, or in Cygwin? 那么,这是wc或Cygwin中的错误吗? Or something else? 或者是其他东西? The wc manpage says: wc联机帮助页说:

Print newline, word, and byte counts for each FILE, and a total line if more than one FILE is specified. 打印每个FILE的换行符,单词和字节计数,如果指定了多个FILE,则打印总行数。

It doesn't mention anything about multiple total lines (intermediate total counts or something), so who's to blame here? 它没有提到任何关于多个总线数(中间总计数或其他东西),所以谁应该责怪这里?

What's happening is that xargs is running wc multiple times. 发生的事情是xargs多次运行wc xargs by default batches as many arguments as it thinks it can into each invocation of the command it's supposed to run, but if there are too many files it will run the command multiple times on subsets of the files. 默认情况下, xargs按照它认为可以运行的命令的每次调用来批处理多个参数,但是如果文件太多,它将在文件的子集上多次运行该命令。

There are a couple ways I see to fix this. 我有几种方法可以解决这个问题。 The first, which will break if you have too many files, is to skip xargs and use the shell. 第一个,如果你有太多的文件将会破坏,是跳过xargs并使用shell。 This may not work well on Cygwin, but would look like this: 这可能不适用于Cygwin,但看起来像这样:

wc -l $(find . '(' -name '*.vb' -o -name '*.cs' ')' \
    -a '!' -iname '*.Designer.*' -a '!' -iname '.svn' )

and you also lose the print0 capabilities. 而你也失去了print0功能。

The other is to use an awk (or perl ) script to process the output of your find / xargs combo, skip "total" lines, and sum up the total yourself. 另一种是使用awk (或perl )脚本来处理find / xargs组合的输出,跳过“total”行,并自己总结总数。

You're calling wc multiple times - once for each "batch" of input arguments provided by xargs. 你多次调用wc - 对于xargs提供的每个“批量”输入参数一次。 You're getting one total per batch. 你每批获得一个。

One alternative is to use a temporary file and the --files0-from option for wc : 另一种方法是使用临时文件和wc--files0-from选项:

$ find . '(' -name '*.vb' -o -name '*.cs' ')' -a '!' -iname '*.Designer.*' -a 
    '!' -iname   '.svn' -print0 > files

$ wc --files0-from files

The command-line length is much more limited under cygwin than on a standard linux box, and xargs must split the input to respect those limits. cygwin下的命令行长度比标准的linux盒子更受限制,而xargs必须将输入分开以遵守这些限制。 You can check the limits with xargs --show-limits : 您可以使用xargs --show-limits检查xargs --show-limits

On cygwin: 在cygwin上:

$ xargs --show-limits < /dev/null
Your environment variables take up 4913 bytes
POSIX upper limit on argument length (this system): 25039
POSIX smallest allowable upper limit on argument length (all systems): 4096
Maximum length of command we could actually use: 20126
Size of command buffer we are actually using: 25039

On centos: 在centos上:

$ xargs --show-limits < /dev/null
Your environment variables take up 1816 bytes
POSIX upper limit on argument length (this system): 2617576
POSIX smallest allowable upper limit on argument length (all systems): 4096
Maximum length of command we could actually use: 2615760
Size of command buffer we are actually using: 131072

And to build on @JonSkeet's answer, you don't need to create an additional file, you can pipe your find results directly to wc, by passing - as argument to --files0-from : 要建立@ JonSkeet的答案,你不需要创建一个额外的文件,你可以将你的查找结果直接传递给wc,方法是将-作为参数传递给--files0-from

find . -name '*.vb' -print0 | wc -l --files0-from=-

To avoid generation of multiple lines with "total" counts when feeding the wc utility with an enormous number of file paths as command line arguments, you can use an intermediate xargs to cat the contents of files to the stdin of wc (see piping output of find to xargs wc gives unreasonable totals ). 为了避免生成的多行用“总”计数馈送时wc与作为命令行参数文件路径巨大数量实用程序,可以使用一个中间xargscat的文件的内容的标准输入wc (见管道的输出找到xargs wc给出不合理的总数 )。

This is a workaround if your wc command does not have the --files0-from as mentioned by Xavier. 如果您的wc命令没有Xavier提到的--files0-from ,这是一种解决方法。

count_curdir_src() (
   export LC_ALL=C
   find . -name '*.vb' -print0 | xargs -0 -n 1000 cat | wc -l 
)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM