使用Perl regex捕获一行中的所有单词并计算它们的出现

Question

I am trying to know how many words are there in a paragraph and then find the count of each word occurrence. 我试图知道一个段落中有多少个单词，然后找到每个单词出现的次数。 I could do it , but is there is any other way to do using only regex? 我可以做到，但是仅使用正则表达式还有其他方法吗？

my $string = "John is a good boy. John goes to school with his brother Johnny. When John is hungry, he eats his tiffin.";
my @list = ();
while($string =~ /(\b\w+\b)/gi)
{
        push(@list, $1);
}

my %counts;
for (@list) {
   $counts{$_}++;
}
print "$#list \n";
foreach my $keys (keys %counts) {
   print "$keys = $counts{$keys}\n";
}

Output should be 输出应为

20
brother = 1
a = 1
goes = 1
is = 2
good = 1
to = 1
tiffin = 1
When = 1
boy = 1
his = 2
school = 1
Johnny = 1
he = 1
eats = 1
John = 3
with = 1
hungry = 1

Answer 1

I can't see a way to do this purely with a regex and if such a way did exist, it would be a really overly complicated regex that would be very hard to maintain. 我看不到完全使用正则表达式执行此操作的方法，如果确实存在这种方法，那将是一个非常复杂的正则表达式，将很难维护。 But it is possible to simplify what you have by just using a hash and losing the list; 但是可以通过仅使用散列并丢失列表来简化拥有的内容。

use strict;
use warnings;

my $string = "John is a good boy. John goes to school with his brother Johnny. When John is hungry, he eats his tiffin.";
my %counts;
my $word_count = 0;
while($string =~ /\b(\w+)\b/g)
    {
    $counts{$1}++;
    $word_count++;
    }

print "$word_count\n";
foreach my $keys (keys %counts)
    {
    print "$keys = $counts{$keys}\n";
    }

Note: I've tweaked the regex slightly as you don't need the "\\b" inside the capture group and making it case-insensitive wasn't required as you're not matching specific strings. 注意：由于您不需要在捕获组中使用“ \\ b”，因此对regex进行了细微调整，并且由于不匹配特定的字符串，因此不需要区分大小写。 And added "use strict;" 并添加了“使用严格”； and "use warnings;" 和“使用警告”； which you should always have at the top of your perl to throw up any problems with it. 您应该始终将其放在Perl的顶部，以解决任何问题。

使用Perl regex捕获一行中的所有单词并计算它们的出现

问题描述

1 个解决方案

解决方案1
2 已采纳 2017-10-05 15:34:33

使用Perl regex捕获一行中的所有单词并计算它们的出现

问题描述

1 个解决方案

解决方案1 2 已采纳 2017-10-05 15:34:33

解决方案1
2 已采纳 2017-10-05 15:34:33