简体   繁体   English

Perl字符串解析为哈希

[英]Perl String Parsing to Hash

So lets say I had the string. 所以可以说我有琴弦。

 $my str = "Hello how are you today. Oh thats good I'm glad you are happy. Thats wonderful; thats fantastic."

I want to create a hash table where each key is a unique word and the value is the number of times it appears in the string ie, I want it to be an automated process. 我想创建一个哈希表,其中每个键是一个唯一的单词,值是它在字符串中出现的次数,即,我希望它是一个自动化的过程。

my %words {
  "Hello" => 1,
  "are" => 2,
  "thats" => 2,
  "Thats" => 1
  };

I honestly am brand new to PERL and have no clue how to do this, how to handle the punctuation etc. 老实说,我是PERL的新手,不知道如何执行此操作,如何处理标点符号等。

UPDATE: 更新:

Also, is it possible to use 另外,有可能使用

   split('.!?;',$mystring)   

Not with this syntax, but basically split at a . 不使用此语法,而是基本上在处拆分。 or ! 要么 ! or ? 要么 ? etc.. oh and ' ' (whitespace) 等等。哦和''(空格)

One simple way to do it is to split the string on any character that is not a valid word-character in your view. 一种简单的方法是将字符串split为视图中不是有效单词字符的任何字符。 Note that this is by no means an exhaustive solution as it is. 请注意,这绝不是一个详尽的解决方案。 I have simply taken a limited set of characters. 我只是选择了一组有限的字符。

You can add valid word-characters inside the brackets [ ... ] as you discover edge cases. 您可以在发现小写情况时在方括号[ ... ]内添加有效的单词字符。 You might also search http://search.cpan.org for modules designed for this purpose. 您也可以在http://search.cpan.org中搜索为此目的设计的模块。

The regex [^ ... ] means match any character that is not inside the brackets. 正则表达式[^ ... ]表示匹配括号内没有的任何字符。 \\pL is a larger subset of letters, and the others literal. \\pL是字母和其他文字的较大子集。 Dash - must be escaped because it is a meta character inside a character class bracket. 破折号-必须转义,因为它是字符类括号内的元字符。

use strict;
use warnings;
use Data::Dumper;

my $str = "Hello how are you today. Oh thats good I'm glad you are happy.
           Thats wonderful; thats fantastic.";
my %hash;
$hash{$_}++                      # increase count for each field
    for                          # in the loop
    split /[^\pL'\-!?]+/, $str;  # over the list from splitting the string 
print Dumper \%hash;

Output: 输出:

$VAR1 = {
          'wonderful' => 1,
          'glad' => 1,
          'I\'m' => 1,
          'you' => 2,
          'how' => 1,
          'are' => 2,
          'fantastic' => 1,
          'good' => 1,
          'today' => 1,
          'Hello' => 1,
          'happy' => 1,
          'Oh' => 1,
          'Thats' => 1,
          'thats' => 2
        };

This will use whitespace to separate words. 这将使用空格分隔单词。

#!/usr/bin/env perl
use strict;
use warnings;

my $str = "Hello how are you today."
        . " Oh thats good I'm glad you are happy."
        . " Thats wonderful. thats fantastic.";

# Use whitespace to split the string into single "words".
my @words = split /\s+/, $str;

# Store each word in the hash and count its occurrence.
my %hash;
for my $word ( @words ) {
    $hash{ $word }++;
}

# Show each word and its count. Using printf to align output.
for my $key ( sort keys %hash ) {
    printf "\%-10s => \%d\n", $key, $hash{ $key };
}

You will need some fine-tuning to get "real" words. 您将需要进行一些微调才能获得“真实”的单词。

Hello      => 1
I'm        => 1
Oh         => 1
Thats      => 1
are        => 2
fantastic. => 1
glad       => 1
good       => 1
happy.     => 1
how        => 1
thats      => 2
today.     => 1
wonderful. => 1
you        => 2

Try this: 尝试这个:

use strict;
use warnings;

my $str = "Hello, how are you today. Oh thats good I'm glad you are happy. 
           Thats wonderful.";
my @strAry = split /[:,\.\s\/]+/, $str;
my %strHash;

foreach my $word(@strAry) 
{
    print "\nFOUND WORD: ".$word;
    my $exstCnt = $strHash{$word};

    if(defined($exstCnt)) 
    {
        $exstCnt++;
    } 
    else 
    {
        $exstCnt = 1;
    }

    $strHash{$word} = $exstCnt;
}

print "\n\nNOW REPORTING UNIQUE WORDS:\n";

foreach my $unqWord(sort(keys(%strHash))) 
{
    my $cnt = $strHash{$unqWord};
    print "\n".$unqWord." - ".$cnt." instances";
}
 use YAML qw(Dump);
 use 5.010;

 my $str = "Hello how are you today. Oh thats good I'm glad you are happy. Thats wonderful; thats fantastic.";
 my @match_words = $str =~ /(\w+)/g;
 my $word_hash = {};
 foreach my $word (sort @match_words) {
     $word_hash->{$word}++;
 }
 say Dump($word_hash);
 # -------output----------
 Hello: 1
 I: 1
 Oh: 1
 Thats: 1
 are: 2
 fantastic: 1
 glad: 1
 good: 1
 happy: 1
 how: 1
 m: 1
 thats: 2
 today: 1
 wonderful: 1
 you: 2

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM