如何在perl中计算一行中的单词数？

Question

I know I could write my own while loop along with regex to count the words in a line. 我知道我可以与正则表达式一起编写自己的while循环，以计算一行中的单词数。 But, I am processing like 1000 lines and I dont want to run this loop each and every time. 但是，我正在处理1000行，并且我不想每次都运行此循环。 So, I was wondering is there any way to count the words in the line in perl. 因此，我想知道是否有任何方法可以在perl中计算行中的单词。

Answer 1

1000 times is not a significant number to a modern computer. 对于现代计算机而言，1000倍并不是什么大数目。 In general, write the code that makes sense to you, and then, if there is a performance problem, worry about optimization. 通常，编写对您有意义的代码，然后，如果存在性能问题，请担心优化。

To count words, first you need to decide what is a word. 要计算单词，首先需要确定什么是单词。 One approach is to match groups of consecutive word characters, but that counts "it's" as two words. 一种方法是匹配连续的单词字符组，但这将“它是”视为两个单词。 Another is to match groups of consecutive non-whitespace, but that counts "phrase - phrase" as three words. 另一个是匹配连续非空格的组，但是将“短语-短语”视为三个单词。 Once you have a regex that matches a word, you can count words like this (using consecutive word characters for this example): 一旦有了与单词匹配的正则表达式，就可以对这样的单词进行计数（此示例使用连续的单词字符）：

scalar( () = $line =~ /\w+/g )

Answer 2

How about splitting the line on one or more non-word characters and counting the size of the resulting array? 如何将行分割成一个或多个非单词字符并计算结果数组的大小？

$ echo "one, two, three" | perl -nE "say scalar split /\W+/"
3

As a sub that would be: 作为一个子将是：

# say count_words 'foo bar' => 2
sub count_words { scalar split /\W+/, shift }

To get rid of the leading space problem spotted by ysth, you can filter out the empty segments: 为了摆脱ysth发现的领先空间问题，您可以过滤出空段：

$ echo " one, two, three" | perl -nE 'say scalar grep {length $_} split /\W+/'
3

…or shave the input string: …或剃除输入字符串：

$ echo " one, two, three" | perl -nE 's/^\W+//; say scalar split /\W+/'
3

如何在perl中计算一行中的单词数？

问题描述

2 个解决方案

解决方案1
4 2011-05-08 18:26:03

解决方案2
2 2011-05-08 18:10:47

如何在perl中计算一行中的单词数？

问题描述

2 个解决方案

解决方案1 4 2011-05-08 18:26:03

解决方案2 2 2011-05-08 18:10:47

解决方案1
4 2011-05-08 18:26:03

解决方案2
2 2011-05-08 18:10:47