简体   繁体   English

计算单词数并指定行

[英]Count the number of words and specify the lines

It is required to make a file with the number of words encountered (for example, Word1 and Word2 ) from another file and specify the lines where these words occur in this format:需要从另一个文件中创建一个包含遇到的单词数(例如Word1Word2 )的文件,并以这种格式指定这些单词出现的行:

Word1: 35 [25, 50, 300, ...]    
Word2: 15 [10, 25, 65, ...]    

Your question unfortunately lacks sample input files that demonstrates all the sorts of things you need to handle and the expected output based on them, so I'm just making up some stuff.不幸的是,您的问题缺少示例输入文件来演示您需要处理的各种事情以及基于它们的预期输出,所以我只是在编造一些东西。

Given the files鉴于文件

wordlist.txt : wordlist.txt :

cat
dog
fish
horse

and input.txt :input.txt

There are three fish.
Two red fish.
One blue fish and a brown dog.
There are no matching words on this line.
Also there is no cat, only the dog. Oh, there is a white dog too.
There are doggies.

this perl script will print the matching words and their lines, including multiple matches of a word per line:这个 perl 脚本将打印匹配的单词及其行,包括每行一个单词的多个匹配:

#!/usr/bin/env perl
use warnings;
use strict;
use autodie;
use feature qw/say/;
use English;

my %words;

open my $wordlist, "<", $ARGV[0];
while (<$wordlist>) {
    chomp;
    $words{$_} = [];
}

open my $text, "<", $ARGV[1];
while (<$text>) {
    while (my ($word, $positions) = each %words) {
        while (m/\b\Q$word\E\b/g) { # Match all occurrences of the word by itself
            push @$positions, $NR;
        }
    }
}

$OFS = ' ';
for my $word (sort keys %words) {
    my $positions = $words{$word};
    say "$word:", scalar(@$positions), join(',', @$positions);
}

Example:例子:

$ perl words.pl wordlist.txt input.txt
cat: 1 5
dog: 3 3,5,5
fish: 3 1,2,3
horse: 0

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM