简体   繁体   中英

perl script to read content between marks

In the perl , how to read the contents between two marks. Source data like this

START_HEAD
ddd
END_HEAD

START_DATA
eee|234|ebf
qqq|              |ff
END_DATA

--Generate at 2011:23:34

then I only want to get data between "START_DATA" and "END_DATA". How to do this ?

sub readFile(){ 
    open(FILE, "<datasource.txt") or die "file is not found";

    while(<FILE>){      
        if(/START_DATA/){           
            record(\*FILE);#start record;
        }
    }
}

sub record($){
    my $fileHandle = $_[0];

    while(<fileHandle>){
        print $_."\n";      
        if(/END_DATA/) return ;         
    }
}

I write this code, it doesn't work. do you know why ?

Thanks

Thanks

You can use the range operator:

perl -ne 'print if /START_DATA/ .. /END_DATA/'

The output will include the *_DATA lines, too, but it should not be so hard to get rid of them.

Besides a few typos, your code is not too far off. Had you used

use strict;
use warnings;

You might have figured it out yourself. Here's what I found:

  • Don't use prototypes if you do not need them, or know what they do.

Normal sub declaration is sub my_function (prototype) { , but you can leave out the prototype and just use sub my_function { .

  • while (<fileHandle>) { is missing the $ sign to denote that it is a variable (scalar) and not a global. Should be $fileHandle .
  • print $_."\\n"; will add an extra newline. Just print; will do what you expect.
  • if(/END_DATA/) return; is a syntax error. Brackets are not optional in perl in this case. Unless you reverse the statement.

Use either:

return if (/END_DATA/);

or

if (/END_DATA/) { return }

Below is the cleaned up version. I commented out your open() while testing, so this would be a functional code example.

use strict;
use warnings;

readFile();

sub readFile { 
    #open(FILE, "<datasource.txt") or die "file is not found";
    while(<DATA>) {      
        if(/START_DATA/) {
            recordx(\*DATA); #start record;
        }
    }
}

sub recordx {
    my $fileHandle = $_[0];
    while(<$fileHandle>) {
        print;
        if (/END_DATA/) { return }         
    }
}

__DATA__
START_HEAD
ddd
END_HEAD

START_DATA
eee|234|ebf
qqq|              |ff
END_DATA

--Generate at 2011:23:34

This is a pretty simple thing to do with regular expressions, just use the /s or /m (single line or multiple line) flags - /s allows the . operator to match newlines, so you can do /start_data(.+)end_data/is .

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM