I have been able to use flip-flop to extract text in past where I have different START & END. This time I've been having A LOT of trouble trying to extract text because I do not have different delimiters in my source file, because START & END of flip flop are the same. I want flip flop to start true when line beings with year yyyy & continue to push $_
to an array until another line begins yyyy. The problem with flip-flop is that it will then be false on my next START.
while (<SOURCEFILE>) {
print if (/^2017/ ... /^2017/)
}
Using the above for the given source data will miss the 2nd multi-line part of the file I also need to match. Maybe flip-flop which I thought was the best way to parse a multi line file will not work in this case? What I want to do is start matching with the first line starting with date & continue matching until the line before the next line beginning with a date.
Sample Data is:
2017 message 1
Text
Text
Text
2017 message 2
more text
more text
more text
2017 message 3
yet more text
yet more text
yet more text
But I am getting:
2017 message 1
Text
Text
Text
2017 message 2
2017 message 3
yet more text
yet more text
yet more text
...missing message 2 contents..
I cannot rely on space or a different END delimiter in my source data. What I wanted was for each message to be printed (actually push @myarray, $_
& then test for matches), but here I am missing lines below message 2 because flip flop is set to false. Any way to handle this with flip-flop or I need to use something else? Thanks in advance for anyone that can help/advise.
Here is a way to go:
use Modern::Perl;
use Data::Dumper;
my $part = -1;
my $parts;
while(<DATA>) {
chomp;
if (/^2017/ .. 1==0) {
$part++ if /^2017/;
push @{$parts->[$part]}, $_;
}
}
say Dumper$parts;
__DATA__
2017 message 1
Text
Text
Text
2017 message 2
more text
more text
more text
2017 message 3
yet more text
yet more text
yet more text
Output:
$VAR1 = [
[
'2017 message 1',
'Text',
'Text',
'',
'Text',
''
],
[
'2017 message 2',
'more text',
'more text',
'',
'more text',
''
],
[
'2017 message 3',
'yet more text',
'yet more text',
'',
'yet more text'
]
];
I don't know how to do it with flipflop. I tried it before a year. But the same thing i did with some logic.
my $line_concat;
my $f = 0;
while (<DATA>) {
if(/^2017/ && !$f) {
$f = 1;
}
if (/^2017/) {
print "$line_concat\n" if $line_concat ne "";
$line_concat = "";
}
$line_concat .= $_ if $f;
}
print $line_concat if $line_concat ne "";
Flip flop with a matched delimiter doesn't work too well, as you've found.
Have you considered setting $/
instead?
Eg:
#!/usr/bin/env perl
use strict;
use warnings;
local $/ = "2017 message";
my $count;
while ( <DATA> ) {
print "\nStart of block:", ++$count, "\n";
print;
print "\nEnd of block:", $count, "\n";
}
__DATA__
2017 message 1
Text
Text
Text
2017 message 2
more text
more text
more text
2017 message 3
yet more text
yet more text
yet more text
Although it's not perfect, because it splits the file on the delimiter - meaning there's a 'bit' before the first one (so you get 4 chunks). You can resplice it with judicious use of 'chomp', which removes $/
from the current chunk:
#!/usr/bin/env perl
use strict;
use warnings;
local $/ = "2017 message";
my $count;
while ( <DATA> ) {
#remove '2017 message'
chomp;
#check for empty (first) block
next unless /\S/;
print "\nStart of block:", ++$count, "\n";
#re add '2017 message'
print $/;
print;
print "\nEnd of block:", $count, "\n";
}
Alternatively, how about an array of arrays, that you update the 'target key' each time you hit a message?
#!/usr/bin/env perl
use strict;
use warnings;
use Data::Dumper;
my %messages;
my $message_id;
while ( <DATA> ) {
chomp;
if ( m/2017 message (\d+)/ ) { $message_id = $1 };
push @{ $messages{$message_id} }, $_;
}
print Dumper \%messages;
Note - I'm using a hash, not an array, because that's a bit more robust for messages sequencing that doesn't start consecutively from zero. (And array using this approach would have an empty 'zeroth' element).
Note - it also will have 'empty' ''
elements for you blank lines. You can filter these if you wish though.
You just need a buffer that accumulates the lines until you find one matching /^20\\d\\d[ ]/
or end of file.
my $in = 0;
my @buf;
while (<>) {
if ($in && /^20\d\d[ ]/) {
process(@buf);
@buf = ();
$in = 0;
}
push @buf, $_ if $in ||= /^2017[ ]/;
}
process(@buf) if $in;
We can rearrange the code to make it so the records are only processed in one spot, allowing process
to be inlined.
my $in = 0;
my @buf;
while (1) {
$_ = <>;
if ($in && (!defined($_) || /^20\d\d[ ]/)) {
process(@buf);
@buf = ();
$in = 0;
}
last if !defined($_);
push @buf, $_ if $in ||= /^2017[ ]/;
}
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.