[英]Performance of wc -l
I ran the following command : 我运行了以下命令:
time for i in {1..100}; do find / -name "*.service" | wc -l; done
got a 100 lines of the result then : 得到了100行结果:
real 0m35.466s user 0m15.688s sys 0m14.552s 真实0m35.466s用户0m15.688s sys 0m14.552s
I then ran the following command : 然后,我运行以下命令:
time for i in {1..100}; do find / -name "*.service" | awk 'END{print NR}'; done
got a 100 lines of the result then : 得到了100行结果:
real 0m35.036s user 0m15.848s sys 0m14.056s 真实0m35.036s用户0m15.848s sys 0m14.056s
I precise I already ran find / -name "*.service"
just before so it was cached for both commands. 我精确地说,我之前已经运行过
find / -name "*.service"
,因此find / -name "*.service"
这两个命令都缓存了。
I expected wc -l
to be faster. 我期望
wc -l
更快。 Why is it not ? 为什么不呢?
other's have mentioned that you're probably timing find
, not wc
or awk
. 其他人提到您可能正在
find
时间,而不是wc
或awk
。 still, there may be interesting differences to explore between wc
and awk
in their various flavors. 尽管如此,在
wc
和awk
之间的各种风格之间可能仍有一些有趣的差异需要探索。
here are the results I get: 这是我得到的结果:
Mac OS 10.10.5 awk 0.16m lines/second
GNU awk/gawk 4.1.4 4.4m lines/second
Mac OS 10.10.5 wc 6.8m lines/second
GNU wc 8.27 11m lines/second
i didn't use find
, but instead used wc -l
or `awk 'END{print NR}' on a large text file (66k lines) in a loop. 我没有使用
find
,而是在循环中在大型文本文件(66k行)上使用了wc -l
或`awk'END {print NR}'。
i varied the order of the commands and didn't find any deviations large enough to change the rankings i reported. 我改变了命令的顺序,但没有发现任何大到足以改变我报告的排名的偏差。
LC_CTYPE=C
had no measurable effect on any of these. LC_CTYPE=C
对任何这些都没有可测量的影响。
conclusions 结论
don't use mac builtin command line tools except for trivial amounts of data. 除了少量的数据外,请勿使用Mac内置的命令行工具。
GNU wc is faster than GNU awk at counting lines. 在计数行数方面,GNU wc比GNU awk快。
i use MacPorts GNU binaries. 我使用MacPorts GNU二进制文件。 it would be interesting to see how Homebrew binaries compare.
看看Homebrew二进制文件如何进行比较会很有趣。 (i'm guessing they'd lose.)
(我猜他们会输。)
Three things: 三件事:
Such a small difference is usually not significant: 如此小的差异通常并不重要:
0m35.466s - 0m35.036s = 0m0.43s or 1.2%
Yet wc -l
is faster (10x) than awk 'END{print NR}'
. 然而,
wc -l
更快 (10倍),比awk 'END{print NR}'
% time seq 100000000 | awk 'END{print NR}' > /dev/null real 0m13.624s user 0m14.656s sys 0m1.047s % time seq 100000000 | wc -l > /dev/null real 0m1.604s user 0m2.413s sys 0m0.623s
My guess is that the hard drive cache holds the find
results, so after the first run with wc -l
, most of the reads needed for find
are in the cache. 我的猜测是硬盘驱动器高速缓存保留
find
结果,因此在第一次使用wc -l
运行之后, find
所需的大多数读取都在高速缓存中。 Presumably the difference in times between the initial find
with disk reads and the second find
with cache reads, would be greater than the difference in run times between awk
and wc
. 大概是使用磁盘读取的初始
find
与使用缓存读取的第二find
之间的时间差将大于awk
和wc
之间的运行时间差。
One way to test this is to reboot, which clears the hard disk cache, then run the two tests again, but in the reverse order , so that awk
is run first. 一种测试方法是重新引导,该操作将清除硬盘缓存,然后再次以相反的顺序运行这两个测试,从而首先运行
awk
。 I'd expect that the first-run awk
would be even slower than the first-run wc
, and the second-run wc
would be faster than the second-run awk
. 我希望第一轮
awk
甚至比第一轮wc
慢,而第二轮wc
比第二轮awk
快。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.