I want to get the first two letters in every word in the BSD dict word list, excluding those words that start with only one letter.
Without the one-letter exclusion it runs extremely fast:
time cat /usr/share/dict/web2 | cut -c 1-2 | tr '[a-z]' '[A-Z]' | uniq -c > /dev/null
real 0m0.227s
user 0m0.375s
sys 0m0.021s
grepping on ' ..
', however, is painfully slow:
time cat /usr/share/dict/web2 | cut -c 1-2 | grep '..' | tr '[a-z]' '[A-Z]' | uniq -c > /dev/null
real 1m16.319s
user 1m0.694s
sys 0m10.225s
What's going on here?
What's really slow on the Mac is the UTF-8 locale.
Replace grep ..
with LC_ALL=C grep ..
then your command will run over 100x faster.
This is probably true of Linux as well, except a given Linux distro is probably more likely to default to the C environment.
I don't know why it is so awful. But I know one quick way to speed it up is to invert your grep(1)
expression with -v
, and throw away all one-character lines:
$ time cat /usr/share/dict/words | cut -c 1-2 | grep -v '^.$' | tr '[a-z]' '[A-Z]' | uniq -c > /dev/null
real 0m0.086s
user 0m0.090s
sys 0m0.000s
这可能会运行得更好,也可以摆脱你需要另一个管道的切割。
cat /usr/share/dict/web2 | egrep -o '^.{2,}' | tr '[a-z]' '[A-Z]' | uniq -c > /dev/null
如果你减少使用过多的管道和无用的猫,它甚至可能会更快
$ awk '{ a[toupper(substr($0,1,2))]++ } END{for(i in a) print i,a[i] }' file
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.