[英]Why does the wc utility generate multiple lines with “total”?
I am using the wc utility in a shell script that I run from Cygwin, and I noticed that there is more than one line with "total" in its output. 我在从Cygwin运行的shell脚本中使用wc实用程序,我注意到输出中有多行“total”。
The following function is used to count the number of lines in my source files: 以下函数用于计算源文件中的行数:
count_curdir_src() {
find . '(' -name '*.vb' -o -name '*.cs' ')' \
-a '!' -iname '*.Designer.*' -a '!' -iname '.svn' -print0 | \
xargs -0 wc -l
}
But its output for a certain directory looks like this: 但它对某个目录的输出如下所示:
$ find . '(' -name '*.vb' -o -name '*.cs' ')' -a '!' -iname '*.Designer.*' -a '!' -iname '.svn' -print0 | xargs -0 wc -l
19 ./dirA/fileABC.cs
640 ./dirA/subdir1/fileDEF.cs
507 ./dirA/subdir1/fileGHI.cs
2596 ./dirA/subdir1/fileJKL.cs
(...many others...)
58 ./dirB/fileMNO.cs
36 ./dirB/subdir1/filePQR.cs
122200 total
6022 ./dirB/subdir2/subsubdir/fileSTU.cs
24 ./dirC/fileVWX.cs
(...)
36 ./dirZ/Properties/AssemblyInfo.cs
88 ./dirZ/fileYZ.cs
25236 total
It looks like wc resets somewhere in the process. 看起来wc会在过程中的某个位置重置。 It cannot be caused by space characters in filenames or directory names, because I use the
-print0
option. 它不能由文件名或目录名中的空格字符引起,因为我使用
-print0
选项。 And it only happens when I run it on my largest source tree. 它只发生在我最大的源树上运行时。
So, is this a bug in wc, or in Cygwin? 那么,这是wc或Cygwin中的错误吗? Or something else?
或者是其他东西? The wc manpage says:
wc联机帮助页说:
Print newline, word, and byte counts for each FILE, and a total line if more than one FILE is specified.
打印每个FILE的换行符,单词和字节计数,如果指定了多个FILE,则打印总行数。
It doesn't mention anything about multiple total lines (intermediate total counts or something), so who's to blame here? 它没有提到任何关于多个总线数(中间总计数或其他东西),所以谁应该责怪这里?
What's happening is that xargs
is running wc
multiple times. 发生的事情是
xargs
多次运行wc
。 xargs
by default batches as many arguments as it thinks it can into each invocation of the command it's supposed to run, but if there are too many files it will run the command multiple times on subsets of the files. 默认情况下,
xargs
按照它认为可以运行的命令的每次调用来批处理多个参数,但是如果文件太多,它将在文件的子集上多次运行该命令。
There are a couple ways I see to fix this. 我有几种方法可以解决这个问题。 The first, which will break if you have too many files, is to skip
xargs
and use the shell. 第一个,如果你有太多的文件将会破坏,是跳过
xargs
并使用shell。 This may not work well on Cygwin, but would look like this: 这可能不适用于Cygwin,但看起来像这样:
wc -l $(find . '(' -name '*.vb' -o -name '*.cs' ')' \
-a '!' -iname '*.Designer.*' -a '!' -iname '.svn' )
and you also lose the print0 capabilities. 而你也失去了print0功能。
The other is to use an awk
(or perl
) script to process the output of your find
/ xargs
combo, skip "total" lines, and sum up the total yourself. 另一种是使用
awk
(或perl
)脚本来处理find
/ xargs
组合的输出,跳过“total”行,并自己总结总数。
You're calling wc multiple times - once for each "batch" of input arguments provided by xargs. 你多次调用wc - 对于xargs提供的每个“批量”输入参数一次。 You're getting one total per batch.
你每批获得一个。
One alternative is to use a temporary file and the --files0-from
option for wc
: 另一种方法是使用临时文件和
wc
的--files0-from
选项:
$ find . '(' -name '*.vb' -o -name '*.cs' ')' -a '!' -iname '*.Designer.*' -a
'!' -iname '.svn' -print0 > files
$ wc --files0-from files
The command-line length is much more limited under cygwin than on a standard linux box, and xargs
must split the input to respect those limits. cygwin下的命令行长度比标准的linux盒子更受限制,而
xargs
必须将输入分开以遵守这些限制。 You can check the limits with xargs --show-limits
: 您可以使用
xargs --show-limits
检查xargs --show-limits
:
On cygwin: 在cygwin上:
$ xargs --show-limits < /dev/null
Your environment variables take up 4913 bytes
POSIX upper limit on argument length (this system): 25039
POSIX smallest allowable upper limit on argument length (all systems): 4096
Maximum length of command we could actually use: 20126
Size of command buffer we are actually using: 25039
On centos: 在centos上:
$ xargs --show-limits < /dev/null
Your environment variables take up 1816 bytes
POSIX upper limit on argument length (this system): 2617576
POSIX smallest allowable upper limit on argument length (all systems): 4096
Maximum length of command we could actually use: 2615760
Size of command buffer we are actually using: 131072
And to build on @JonSkeet's answer, you don't need to create an additional file, you can pipe your find results directly to wc, by passing -
as argument to --files0-from
: 要建立@ JonSkeet的答案,你不需要创建一个额外的文件,你可以将你的查找结果直接传递给wc,方法是将
-
作为参数传递给--files0-from
:
find . -name '*.vb' -print0 | wc -l --files0-from=-
To avoid generation of multiple lines with "total" counts when feeding the wc
utility with an enormous number of file paths as command line arguments, you can use an intermediate xargs
to cat
the contents of files to the stdin of wc
(see piping output of find to xargs wc gives unreasonable totals ). 为了避免生成的多行用“总”计数馈送时
wc
与作为命令行参数文件路径巨大数量实用程序,可以使用一个中间xargs
对cat
的文件的内容的标准输入wc
(见管道的输出找到xargs wc给出不合理的总数 )。
This is a workaround if your wc
command does not have the --files0-from
as mentioned by Xavier. 如果您的
wc
命令没有Xavier提到的--files0-from
,这是一种解决方法。
count_curdir_src() (
export LC_ALL=C
find . -name '*.vb' -print0 | xargs -0 -n 1000 cat | wc -l
)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.