简体   繁体   中英

Port awk one-liner to perl using regexes (Summing groups of data)

Given the following input:

£ cat problem
  Team 7
  John: 19
  Sue: 20
  Pam: 35
  Team 42
  Jeff: 12
  Sam: 3
  Phil: 26
  Jill: 10
  Team 9
  Bill: 19
  John: 7
  Linda: 15

I am trying to reproduce the following output in a perl one-liner:

£ awk '/Team/ {x=$2} /: *[0-9]+/ {myarray[x]+=$2}; END{for (key in myarray) {print "Team " key ": " myarray[key]}}' problem
Team 42: 51
Team 7: 74
Team 9: 41

This (basic) problem and the data actually comes from a perl tutorial. http://learnperl.scratchcomputing.com/tutorials/csss/

I'm interested in something close to this, which I have tried...literally over 50 variations of:

 £ perl -e 'while(<>){if (/Team/){$x = /(\d+)/;}else{/(\d+)/;$myarray{$x}+= {$1}; }} foreach $key (keys %myarray){print "$key\n";}' problem

I know that 1) these regexes are not returning the match (the += {$1} was an attempt to remedy that), and 2) even if they were I am probably populating the hash incorrectly. Googling potential solutions returns me verbose, multiline code. I was waiting to get the keys outputting correctly before I even bothered moving onto the values, btw.

Without examining the awk code, I'd write this perl:

perl -lne '
        if (/Team (\d+)/) {
            $t = $1;
        } elsif (/: (\d+)/) {
            $score{$t} += $1
        }
    } END {
        printf "Team %d: %d\n", $_, $score{$_} for keys %score
' problem
Team 7: 74
Team 9: 41
Team 42: 51

The -n option implicitly wraps a while (<>) {...} loop around the given code. See perldoc perlrun for all the details.

The } END { bit exploits that, and allows you to collect data inside the while loop, then do stuff with it when the loop is complete.


You can see how perl deals with one-liners by adding the -MO=Deparse option:

perl -MO=Deparse -lne '
        if (/Team (\d+)/) {
            $t = $1;
        } elsif (/: (\d+)/) {
            $score{$t} += $1
        }
    } END {
        printf "Team %d: %d\n", $_, $score{$_} for keys %score
' problem
BEGIN { $/ = "\n"; $\ = "\n"; }
LINE: while (defined($_ = <ARGV>)) {
    chomp $_;
    if (/Team (\d+)/) {
        $t = $1;
    }
    elsif (/: (\d+)/) {
        $score{$t} += $1;
    }
}
sub END {
    printf "Team %d: %d\n", $_, $score{$_} foreach (keys %score);
}
-e syntax OK

I'm not sure why it turns the END block into a subroutine though.


A couple of notes about your code:

  1. Use better variable names.

  2. $x = /(\\d+)/

    • m// used in scalar context will return a true/false value, regardless of the use of capturing parentheses (ref: http://perldoc.perl.org/perlop.html#Regexp-Quote-Like-Operators )
    • to assign the captured text to variables, you must have a list context on the left-hand side: ($a, $list, $of, $variables) = (a, list, of, values) , so

       ($x) = /(\\d+)/; # with parentheses on the left-hand side 
  3. $myarray{$x}+= {$1}

    • when you use {...} as an expression, you are creating a hash reference
    • in this case {$1} , your hash does not have enough elements: it requires an even numbered list of key-value pairs.
    • you just want $myarray{$x} += $1
  4. foreach $key (keys %myarray){...}

    • an alternate way to iterate over a hash is to use the each function in a while loop:

       while (my ($key, $value) = each %myarray) { print "Team $key: $value\\n"; } 

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM