如何使用Perl从文件中提取缩写？

Question

I need to extract certain Abbreviations from a file such as ABS,TVS,and PERL. 我需要从文件中提取某些缩写，例如ABS，TVS和PERL。 Any abbreviations which are in uppercase letters. 大写字母的任何缩写。 I'd preferably like to do this with a regular expression. 我最好使用正则表达式执行此操作。 Any help is appreciated. 任何帮助表示赞赏。

Answer 1

It would have been nice to hear what part you were particularly having trouble with. 听到您特别烦恼的那部分会很高兴。

my %abbr;
open my $inputfh, '<', 'filename'
    or die "open error: $!\n";
while ( my $line = readline($inputfh) ) {
    while ( $line =~ /\b([A-Z]{2,})\b/g ) {
        $abbr{$1}++;
    }
}

for my $abbr ( sort keys %abbr ) {
    print "Found $abbr $abbr{$abbr} time(s)\n";
}

Answer 2

Reading text to be searched from standard input and writing all abbreviations found to standard output, separated by spaces: 从标准输入中读取要搜索的文本，并将找到的所有缩写写到标准输出中，并用空格分隔：

my $text;
# Slurp all text
{ local $/ = undef; $text = <>; }
# Extract all sequences of 2 or more uppercase characters
my @abbrevs = $text =~ /\b([[:upper:]]{2,})\b/g;
# Output separated by spaces
print join(" ", @abbrevs), "\n";

Note the use of the POSIX character class [:upper:], which will match all uppercase characters, not just English ones (AZ). 请注意POSIX字符类[：upper：]的使用，它将匹配所有大写字符，而不仅仅是英语（AZ）。

Answer 3

Untested: 未经测试：


my %abbr;
open (my $input, "<", "filename")
  || die "open: $!";
for ( < $input > ) {
  while (s/([A-Z][A-Z]+)//) {
    $abbr{$1}++;
  }
}

Modified it to look for at least two consecutive capital letters. 修改它以查找至少两个连续的大写字母。

Answer 4

#!/usr/bin/perl

use strict;
use warnings;

my %abbrs = ();

while(<>){
    my @words = split ' ', $_;

    foreach my $word(@words){
        $word =~ /([A-Z]{2,})/ && $abbrs{$1}++;
    }
}

# %abbrs now contains all abreviations

如何使用Perl从文件中提取缩写？

问题描述

4 个解决方案

解决方案1
4 2009-07-08 09:18:08

解决方案2
3 2009-07-08 10:15:16

解决方案3
2 2009-07-08 08:09:28

解决方案4
2 2009-07-08 09:25:35

如何使用Perl从文件中提取缩写？

问题描述

4 个解决方案

解决方案1 4 2009-07-08 09:18:08

解决方案2 3 2009-07-08 10:15:16

解决方案3 2 2009-07-08 08:09:28

解决方案4 2 2009-07-08 09:25:35

解决方案1
4 2009-07-08 09:18:08

解决方案2
3 2009-07-08 10:15:16

解决方案3
2 2009-07-08 08:09:28

解决方案4
2 2009-07-08 09:25:35