简体   繁体   English

wc -l的性能

[英]Performance of wc -l

I ran the following command : 我运行了以下命令:

time for i in {1..100}; do find / -name "*.service" | wc -l; done

got a 100 lines of the result then : 得到了100行结果:

real 0m35.466s user 0m15.688s sys 0m14.552s 真实0m35.466s用户0m15.688s sys 0m14.552s

I then ran the following command : 然后,我运行以下命令:

time for i in {1..100}; do find / -name "*.service" | awk 'END{print NR}'; done

got a 100 lines of the result then : 得到了100行结果:

real 0m35.036s user 0m15.848s sys 0m14.056s 真实0m35.036s用户0m15.848s sys 0m14.056s

I precise I already ran find / -name "*.service" just before so it was cached for both commands. 我精确地说,我之前已经运行过find / -name "*.service" ,因此find / -name "*.service"这两个命令都缓存了。

I expected wc -l to be faster. 我期望wc -l更快。 Why is it not ? 为什么不呢?

other's have mentioned that you're probably timing find , not wc or awk . 其他人提到您可能正在find时间,而不是wcawk still, there may be interesting differences to explore between wc and awk in their various flavors. 尽管如此,在wcawk之间的各种风格之间可能仍有一些有趣的差异需要探索。

here are the results I get: 这是我得到的结果:

Mac OS 10.10.5 awk    0.16m lines/second
GNU awk/gawk 4.1.4    4.4m  lines/second
Mac OS 10.10.5 wc     6.8m  lines/second
GNU wc 8.27          11m    lines/second

i didn't use find , but instead used wc -l or `awk 'END{print NR}' on a large text file (66k lines) in a loop. 我没有使用find ,而是在循环中在大型文本文件(66k行)上使用了wc -l或`awk'END {print NR}'。

i varied the order of the commands and didn't find any deviations large enough to change the rankings i reported. 我改变了命令的顺序,但没有发现任何大到足以改变我报告的排名的偏差。

LC_CTYPE=C had no measurable effect on any of these. LC_CTYPE=C对任何这些都没有可测量的影响。

conclusions 结论

  1. don't use mac builtin command line tools except for trivial amounts of data. 除了少量的数据外,请勿使用Mac内置的命令行工具。

  2. GNU wc is faster than GNU awk at counting lines. 在计数行数方面,GNU wc比GNU awk快。

i use MacPorts GNU binaries. 我使用MacPorts GNU二进制文件。 it would be interesting to see how Homebrew binaries compare. 看看Homebrew二进制文件如何进行比较会很有趣。 (i'm guessing they'd lose.) (我猜他们会输。)

Three things: 三件事:

  1. Such a small difference is usually not significant: 如此小的差异通常并不重要:

     0m35.466s - 0m35.036s = 0m0.43s or 1.2% 
  2. Yet wc -l is faster (10x) than awk 'END{print NR}' . 然而, wc -l 更快 (10倍),比awk 'END{print NR}'

     % time seq 100000000 | awk 'END{print NR}' > /dev/null real 0m13.624s user 0m14.656s sys 0m1.047s % time seq 100000000 | wc -l > /dev/null real 0m1.604s user 0m2.413s sys 0m0.623s 
  3. My guess is that the hard drive cache holds the find results, so after the first run with wc -l , most of the reads needed for find are in the cache. 我的猜测是硬盘驱动器高速缓存保留find结果,因此在第一次使用wc -l运行之后, find所需的大多数读取都在高速缓存中。 Presumably the difference in times between the initial find with disk reads and the second find with cache reads, would be greater than the difference in run times between awk and wc . 大概是使用磁盘读取的初始find与使用缓存读取的第二find之间的时间差将大于awkwc之间的运行时间差。

    One way to test this is to reboot, which clears the hard disk cache, then run the two tests again, but in the reverse order , so that awk is run first. 一种测试方法是重新引导,该操作将清除硬盘缓存,然后再次以相反的顺序运行这两个测试,从而首先运行awk I'd expect that the first-run awk would be even slower than the first-run wc , and the second-run wc would be faster than the second-run awk . 我希望第一轮awk甚至比第一轮wc慢,而第二轮wc比第二轮awk快。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM