简体   繁体   English

如何在perl中计算一行中的单词数?

[英]how to count the number of words in a line in perl?

I know I could write my own while loop along with regex to count the words in a line. 我知道我可以与正则表达式一起编写自己的while循环,以计算一行中的单词数。 But, I am processing like 1000 lines and I dont want to run this loop each and every time. 但是,我正在处理1000行,并且我不想每次都运行此循环。 So, I was wondering is there any way to count the words in the line in perl. 因此,我想知道是否有任何方法可以在perl中计算行中的单词。

1000 times is not a significant number to a modern computer. 对于现代计算机而言,1000倍并不是什么大数目。 In general, write the code that makes sense to you, and then, if there is a performance problem, worry about optimization. 通常,编写对您有意义的代码,然后, 如果存在性能问题,请担心优化。

To count words, first you need to decide what is a word. 要计算单词,首先需要确定什么是单词。 One approach is to match groups of consecutive word characters, but that counts "it's" as two words. 一种方法是匹配连续的单词字符组,但这将“它是”视为两个单词。 Another is to match groups of consecutive non-whitespace, but that counts "phrase - phrase" as three words. 另一个是匹配连续非空格的组,但是将“短语-短语”视为三个单词。 Once you have a regex that matches a word, you can count words like this (using consecutive word characters for this example): 一旦有了与单词匹配的正则表达式,就可以对这样的单词进行计数(此示例使用连续的单词字符):

scalar( () = $line =~ /\w+/g )

How about splitting the line on one or more non-word characters and counting the size of the resulting array? 如何将行分割成一个或多个非单词字符并计算结果数组的大小?

$ echo "one, two, three" | perl -nE "say scalar split /\W+/"
3

As a sub that would be: 作为一个子将是:

# say count_words 'foo bar' => 2
sub count_words { scalar split /\W+/, shift }

To get rid of the leading space problem spotted by ysth, you can filter out the empty segments: 为了摆脱ysth发现的领先空间问题,您可以过滤出空段:

$ echo " one, two, three" | perl -nE 'say scalar grep {length $_} split /\W+/'
3

…or shave the input string: …或剃除输入字符串:

$ echo " one, two, three" | perl -nE 's/^\W+//; say scalar split /\W+/'
3

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM