繁体   English   中英

Perl:使用触发器功能并从读取的块中提取数据

[英]Perl: Using the flip flop function and extracting data from within the block read

我有一个名为@mytitles的数组,其中包含很多标题,例如title1title2等。 我有一个名为“ Superdataset ”的文件,其中包含与每个标题有关的信息。 但是,与title1相关的信息可能为6行,而title2的信息可能为30行(随机)。 每条信息(对于titlex )以“ Reading titlex ”开头,以“ Done reading titlexDone reading titlex

从每个标题的这些信息行中,我需要提取一些数据。 我很幸运,我每次需要的数据都在“ Done reading titlex ”之前的2行中

所以我的“ Superdataset ”看起来像:

Reading title1  
 random info line1
 random info line2
 random info line3
 random info line4
 random info line5
 my earnings are 6000
 my expenses are 1000
Done reading title1
Reading title2
 random info line6
 random info line7
 random info line8
 random info line9
 random info line10
 random info line11
 random info line12
 random info line13
 random info line14
 my earnings are 11000
 my expenses are 9000
Done reading title2

我需要支出总额和收入总额。 有什么建议么? PS-数组具有复杂的名称,不像titlex那样简单

这是将数据抽取成可用形式的第一步。

use warnings;
use strict;
use autodie;

my $input_filename = 'example';
open my $input, '<', $input_filename;
my %data;
{
  my $current_title;

  while(<$input>){
    chomp;
    if( /^Reading (.*?)\s*$/ ){ # start of section
      $current_title = $1;
    }elsif( not defined $current_title ){ # outside of any section
      # invalid data
    }elsif( /^Done reading (.*)/ ){ # end of section
      die if $1 ne $current_title;
      $current_title = undef;
    }else{ # add an element of section to array
      push @{ $data{$current_title} }, $_;
    }
  }
}
close $input;

使用创建的数据结构确定总收入和费用。

my( $earnings, $expenses );
for my $list( values %data ){
  for( @$list ){
    if( /earnings are (\d+)/ ){
      $earnings += $1;
    }elsif( /expenses are (\d+)/ ){
      $expenses += $1;
    }
  }
}

print "earnings $earnings\n";
print "expenses $expenses\n";

而是以对计算机更有用的形式打印出来。

use YAML 'Dump';
print Dump \%data;
---
title1:
  - ' random info line1'
  - ' random info line2'
  - ' random info line3'
  - ' random info line4'
  - ' random info line5'
  - ' my earnings are 6000'
  - ' my expenses are 1000'
title2:
  - ' random info line6'
  - ' random info line7'
  - ' random info line8'
  - ' random info line9'
  - ' random info line10'
  - ' random info line11'
  - ' random info line12'
  - ' random info line13'
  - ' random info line14'
  - ' my earnings are 11000'
  - ' my expenses are 9000'

使用“范围”运算符,您可以执行以下操作:

#!/usr/bin/env perl
use strict;
use warnings;
use Data::Dumper;
my $begin_stanza = qr/^Reading/i;
my $endof_stanza = qr/^Done reading/i;
my ( $title, @lines );
my ( $value, $total_earnings, $total_expenses );
while (<DATA>) {
    chomp;
    if ( m{$begin_stanza} .. m{$endof_stanza} ) {
        if ( m{$begin_stanza\s+(.+)} ) {
            $title = $1;
            @lines = ();
            next;
        }
        if ( m{$endof_stanza} ) {
            ($value) = ( $lines[0] =~ m{(\d+)} );
            $total_earnings += $value;
            ($value) = ( $lines[1] =~ m{(\d+)} );
            $total_expenses += $value;
            print join "\n", $title, @lines, "\n";
            next;
        }
        shift @lines if @lines == 2;
        push  @lines, $_;
    }
}
printf "Total Earnings = %7d\n", $total_earnings;
printf "Total Expenses = %7d\n", $total_expenses;
__DATA__
Reading title1
 random info line1
 random info line2
 random info line3
 random info line4
 random info line5
 my earnings are 6000
 my expenses are 1000
Done reading title1
Reading title2
 random info line6
 random info line7
 random info line8
 random info line9
 random info line10
 random info line11
 random info line12
 random info line13
 random info line14
 my earnings are 11000
 my expenses are 9000
Done reading title2

...产生:

title1
 my earnings are 6000
 my expenses are 1000

title2
 my earnings are 11000
 my expenses are 9000

Total Earnings =   17000
Total Expenses =   10000

除非您能预测相关行之前的行,否则触发器运算符不会通过优化来做很多事情。 我认为使用缓冲区数组并仅匹配收入和支出之后的行会更容易。

#!/usr/bin/perl
use strict;
use warnings;

my @buffer;
my ($earnings, $expenses);

for my $line (<DATA>) {
    shift @buffer if @buffer > 2;
    push @buffer, $line;

    next if $line !~ /^Done reading/;

    $earnings += $1 if $buffer[0] =~ /(\d+)$/;
    $expenses += $1 if $buffer[1] =~ /(\d+)$/;
}
print "Total earnings: $earnings\n";
print "Total expenses: $expenses\n";

__DATA__
Reading title1  
 random info line1
 random info line2
 random info line3
 random info line4
 random info line5
 my earnings are 6000
 my expenses are 1000
Done reading title1
Reading title2
 random info line6
 random info line7
 random info line8
 random info line9
 random info line10
 random info line11
 random info line12
 random info line13
 random info line14
 my earnings are 11000
 my expenses are 9000
Done reading title2

输出:

Total earnings: 17000
Total expenses: 10000

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM