简体   繁体   中英

Keep track of indentation-based state with flip-flop operator

I am trying to get familiar with the flip-flop operator, so I can have it as an additional abstraction in my head when doing stateful looping, even though a textbook-style state machine works perfectly well (and verbose and variable-rich) in such a case. I want to keep track of indentation and it seems like I'd still need to manually adjust indentation at the start of every if block in whose condition I call my indenting flip-flop, right? Here's what I came up with:

Program :

use v5.20;
use strict;
use warnings;

my $shiftwidth = 3;

# block_rx: start of indented block marker, without leading spaces
# Keeps state of indentation, which is increased on encountering block marker
# and decreased on matching outdent.
# Function should always get indentation level from context it was called in.
# Returns: true if in indented block, ^ff^, else false

sub indenting_flipflop {
    my $block_rx = $_[0];
    $_ = $_[1];
    my $level = $_[2];
    my $indent = indent($level);
    my $inner_indent = indent($level + 1);
    return ((/^$indent$block_rx/) ... (!/^$inner_indent/)) =~ s/.*E//r;
}

sub indent {
    return ' ' x ($shiftwidth * $_[0]);
}

while (<DATA>) {
    my $level = 0;
    if (indenting_flipflop('books', $_, $level)) {
        $level++;
        if (indenting_flipflop('book', $_, $level)) {
            $level++;
            if (/author: (.*)/) {
              say $1;
            }
        }
    }
}

__DATA__
books:
   book:
      author: Mark Twain
      price: 10.99
   game:
      author: Klaus Teuber
      price: 15.99
   book:
      author: Jane Austen
      price: 12.00

books:
   book:
      author: Mark Twain
      price: 10.99
   game:
      author: Klaus Teuber
      price: 15.99
   book:
      author: Jane Austen
      price: 12.00

Expected output :

Mark Twain
Jane Austen
Mark Twain
Jane Austen

Actual output :

Mark Twain
Klaus Teuber
Jane Austen
Mark Twain
Klaus Teuber
Jane Austen

It would also be nice, if I wouldn't have to adjust $level manually in the loop.

Flip-flop operators with dynamic operands are tricky to use and may not do what you expect. Perl maintains a single "state" for each flip-flop operator that appears in the code, not a separate state for each expression supplied as operands to the flip-flop operator.

Consider this code:

sub foo { m[<foo>] .. m[</foo>] }
sub bar { m[<bar>] .. m[</bar>] }

while (<DATA>) {    
    print "FOO:$_" if foo();
    print "BAR:$_" if bar();
}    
__DATA__
<foo>
   <bar>
      123
   </bar>
   <baz>
       456
   </baz>
</foo>

The output is:

FOO:<foo>
FOO:   <bar>
BAR:   <bar>
FOO:      123
BAR:      123
FOO:   </bar>
BAR:   </bar>
FOO:   <baz>
FOO:       456
FOO:   </baz>
FOO:</foo>

So far, so good, right? This approach won't scale well when there are 100 different tags to track instead of two, so let's try this code:

sub ff { my $tag = shift; m[<$tag>] .. m[</$tag>] }
while (<DATA>) {
    print "FOO:$_" if ff("foo");
    print "BAR:$_" if ff("bar");
}
__DATA__
<foo>
   <bar>
      123
   </bar>
   <baz>
       456
   </baz>
</foo>

Now the output is

FOO:<foo>
BAR:<foo>
FOO:   <bar>
BAR:   <bar>
FOO:      123
BAR:      123
FOO:   </bar>
BAR:   </bar>

What happened? BAR is always printed with the same lines as FOO , and the last line of output is the </bar> line, even though there is more data still enclosed in <foo></foo> tags.

What happened is that the code contains a single flip-flop operator, defined in the ff subroutine, and this operator maintains a single state. The state changes to "true" when ff("foo") is called with the first line of input, and it remains "true" until it encounters input and an operand that satisfies the second expression in the flip-flop operator, which happens with the 4th line when ff("bar") is called. It is not maintaining separate state for foo tags and bar tags as the first example did.

Passing different input to the indenting_flipflop function and expecting the flip-flop operator in that function to just operate on that kind of input will not work.


Update : so this approach, defining a single new function for each tag, works:

sub fff { my $tag = shift; sub { m[<$tag>] .. m[</$tag>] } }
my $foo = fff("foo");
my $bar = fff("bar");
while (<DATA>) {
    print "FOO:$_" if $foo->();
    print "BAR:$_" if $bar->();
}
__DATA__
...

but this one (defining new functions with every line of input) does not:

sub fff { my $tag = shift; sub { m[<$tag>] .. m[</$tag>] } }
while (<DATA>) {
    print "FOO:$_" if fff("foo")->();
    print "BAR:$_" if fff("bar")->();
}
__DATA__
...

On the other other hand a memoized version of it would work:

my %FF;
sub fff { my $tag = shift; $FF{$tag} //= sub { m[<$tag>] .. m[</$tag>] } }
while (<DATA>) {
    print "FOO:$_" if fff("foo")->();
    print "BAR:$_" if fff("bar")->();
}
__DATA__
...

I'm still not convinced that flip-flop operators would add any value to this problem, but to find out you would have to use memoized flip-flop operator generating functions. Replace

...
return ((/^$indent$block_rx/) ... (!/^$inner_indent/)) =~ s/.*E//r;

with

my %FF;
sub flipflopfunc {
    my ($expr1,$expr2) = @_;
    return $FF{$expr1}{$expr2} //= 
        sub { /^$expr1/ ... !/^$expr2/ };
}
...
return flipflopfunc("$indent$block_rx",$inner_indent)->() =~ s/.*E//r;

(not sure what the s/.*E//r is for)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM