perl script to read content between marks

Question

In the perl , how to read the contents between two marks. Source data like this

START_HEAD
ddd
END_HEAD

START_DATA
eee|234|ebf
qqq|              |ff
END_DATA

--Generate at 2011:23:34

then I only want to get data between "START_DATA" and "END_DATA". How to do this ?

sub readFile(){ 
    open(FILE, "<datasource.txt") or die "file is not found";

    while(<FILE>){      
        if(/START_DATA/){           
            record(\*FILE);#start record;
        }
    }
}

sub record($){
    my $fileHandle = $_[0];

    while(<fileHandle>){
        print $_."\n";      
        if(/END_DATA/) return ;         
    }
}

I write this code, it doesn't work. do you know why ?

Thanks

Answer 1

You can use the range operator:

perl -ne 'print if /START_DATA/ .. /END_DATA/'

The output will include the *_DATA lines, too, but it should not be so hard to get rid of them.

Answer 2

Besides a few typos, your code is not too far off. Had you used

use strict;
use warnings;

You might have figured it out yourself. Here's what I found:

Don't use prototypes if you do not need them, or know what they do.

Normal sub declaration is sub my_function (prototype) { , but you can leave out the prototype and just use sub my_function { .

while (<fileHandle>) { is missing the $ sign to denote that it is a variable (scalar) and not a global. Should be $fileHandle .
print $_."\\n"; will add an extra newline. Just print; will do what you expect.
if(/END_DATA/) return; is a syntax error. Brackets are not optional in perl in this case. Unless you reverse the statement.

Use either:

return if (/END_DATA/);

or

if (/END_DATA/) { return }

Below is the cleaned up version. I commented out your open() while testing, so this would be a functional code example.

use strict;
use warnings;

readFile();

sub readFile { 
    #open(FILE, "<datasource.txt") or die "file is not found";
    while(<DATA>) {      
        if(/START_DATA/) {
            recordx(\*DATA); #start record;
        }
    }
}

sub recordx {
    my $fileHandle = $_[0];
    while(<$fileHandle>) {
        print;
        if (/END_DATA/) { return }         
    }
}

__DATA__
START_HEAD
ddd
END_HEAD

START_DATA
eee|234|ebf
qqq|              |ff
END_DATA

--Generate at 2011:23:34

Answer 3

This is a pretty simple thing to do with regular expressions, just use the /s or /m (single line or multiple line) flags - /s allows the . operator to match newlines, so you can do /start_data(.+)end_data/is .

perl script to read content between marks

Question

3 answers

solution1
6 2011-11-06 01:00:22

solution2
3 ACCPTED 2011-11-06 01:35:07

solution3
0 2011-11-06 00:53:43

perl script to read content between marks

Question

3 answers

solution1 6 2011-11-06 01:00:22

solution2 3 ACCPTED 2011-11-06 01:35:07

solution3 0 2011-11-06 00:53:43

solution1
6 2011-11-06 01:00:22

solution2
3 ACCPTED 2011-11-06 01:35:07

solution3
0 2011-11-06 00:53:43