简体   繁体   中英

How to extract a particular data between 2 strings from a text file In sequence or a control manner If more than one such Instances are met

Sample Input Data file :
================

Session Initiation Protocol (REGISTER)
temp data here
Rocky1
Rocky2
Rocky3
Rocky4
CSeq: 3 REGISTER

Session Initiation Protocol (REGISTER)
temp data here
Jocky1
Jocky2
Jocky3
Jocky4
CSeq: 3 REGISTER

Session Initiation Protocol (REGISTER)
Hello
world
Bye
temp data here
CSeq: 3 REGISTER

eg In the above data I want to extract data between variable 1 -> Session Initiation Protocol (REGISTER) and variable 2->CSeq: 3 REGISTER

temp data here

Rocky1
Rocky2
Rocky3
Rocky4

Now as there are multiple occurrences of variable 1 and variable 2 In the below Input file but the data Is different so want to control each occurrence of these variables to manipulate further.

Below Is the program used to extract data which Is actually extracting data from all the occurrences but does not have control If I wish to extract till only first occurrence of variable 1 and variable 2

#!/usr/bin/perl

use strict;
use warnings;
my $file = "output.txt";


my $kw1 = "Session Initiation Protocol (REGISTER)";
my $kw2 = "CSeq: 3 REGISTER";   

while (<DATA>) {

   if ( /\Q$kw2\E/ ... /\Q$kw1\E/ ) {
      print;
   }
}

Added the recent Issue here

#!/usr/bin/perl
use warnings;
use strict;
use feature qw{ say };

my $kw1 = 'Session Initiation Protocol (REGISTER)';
my $kw2 = 'CSeq: 3 REGISTER';

my $instance_counter;
my @first;
my @next;
my $myfile = "Input.txt";
open my $out_file1, '>', 'hello1.txt' or die "$!";
open my $out_file2, '>', 'hello2.txt' or die "$!";


open DATA, $myfile or die "Can't open file: $!";

while (<DATA>) {
    if (my $match = (/\Q$kw1/ .. /\Q$kw2/)) {
        ++$instance_counter if 1 == $match;

        if (1 == $instance_counter) {
            push @first, $_ if /$kw1/;

        } else {
            @next = @first if 1 == $match;
            shift @next;
            push @next , $_;
        }


    }
    print $out_file1 @first;
    print $out_file2 @next;
}

Lets say below Is my Input data :

Session Initiation Protocol (REGISTER)
temp data here
Rocky1
Rocky2
Rocky3
Rocky4
I don't know the text here
CSeq: 3 REGISTER

Session Initiation Protocol (REGISTER)
temp data here
Jocky1
Jocky2
Jocky3
Jocky4
I don't know the text here
CSeq: 3 REGISTER


I want my output to look like as 

output_1.txt
temp data here
Rocky1
Rocky2
Rocky3
Rocky4
I don't know the text here

output_2.txt
temp data here
Jocky1
Jocky2
Jocky3
Jocky4
I don't know the text here


#!/usr/bin/perl
use warnings;
use strict;
use feature qw{ say };

my $kw1 = 'Session Initiation Protocol (REGISTER)';
my $kw2 = 'CSeq: 3 REGISTER';

my $instance_counter;
my @first;
my @next;
my $myfile = "Input.txt";
open my $out_file1, '>', 'hello1.txt' or die "$!";
open my $out_file2, '>', 'hello2.txt' or die "$!";
open my $out_file3, '>', 'hello3.txt' or die "$!";

open DATA, $myfile or die "Can't open file: $!";

while (<DATA>) {
    if (my $match = (/\Q$kw1/ .. /\Q$kw2/)) {
        ++$instance_counter if 1 == $match;

        if (1 == $instance_counter) {
          print $out_file1 $_;
        } 
        elsif (2 == $instance_counter){
        print $out_file2 $_;
        }
        else {
           print $out_file3 $_;
        }


    }

}

I am now getting In separate output files. Can I generalize It for any no of Instances being found from a file ?

Problem 1: you have the range backwards, it should start at $kw1 and end at $kw2. Also, it's unlcear why you used ... instead of .. , as both the expressions never match on the same line.

Note that the range operator return the iteration number, with E0 at the end for the last line, so you can easily catch when the last expression matches:

while (<DATA>) {
    if (my $match = (/\Q$kw1/ .. /\Q$kw2/)) {
        print;
        last if $match =~ /E0/;
    }
}

So, to compare the first instance with each other, you can do:

#!/usr/bin/perl
use warnings;
use strict;
use feature qw{ say };

my $kw1 = 'Session Initiation Protocol (REGISTER)';
my $kw2 = 'CSeq: 3 REGISTER';

my $instance_counter;
my @first;
my @next;

while (<DATA>) {
    if (my $match = (/\Q$kw1/ .. /\Q$kw2/)) {
        ++$instance_counter if 1 == $match;

        if (1 == $instance_counter) {
            push @first, $_ if /ocky\d/;

        } else {
            @next = @first if 1 == $match;
            shift @next if /ocky\d/
                        && substr($_, 1) eq substr $next[0], 1;
        }

        if ($match =~ /E0$/ && $instance_counter > 1) {
            if (@next) {
                say scalar @next, " ockies missing in instance $instance_counter";
            } else {
                say "instance $instance_counter ok";
            }
        }
    }
}

__DATA__
Session Initiation Protocol (REGISTER)
temp data here
Rocky1
Rocky2
Rocky3
Rocky4
CSeq: 3 REGISTER

Session Initiation Protocol (REGISTER)
temp data here
Jocky1
Jocky2
Jocky3
Jocky4
CSeq: 3 REGISTER

Session Initiation Protocol (REGISTER)
Qocky1
Qocky2
Hello
world
Bye
temp data here
CSeq: 3 REGISTER

You have blank lines after each record. Therefore I'd suggest what you need is to look at $/ :

#!/usr/bin/perl

use strict;
use warnings;
my $file = "output.txt";


my $kw1 = "Session Initiation Protocol (REGISTER)";
my $kw2 = "CSeq: 3 REGISTER";

local $/ = '';
while (<DATA>) {
   next unless m/^Session/;
   s/Session Initiation Protocol.*//gm;
   s/^CSeq.*//gm;

   print "\nStart of record\n";
   print;
   print "\nEnd of Record\n";
}


__DATA__
Sample Input Data file :
================

Session Initiation Protocol (REGISTER)
temp data here
Rocky1
Rocky2
Rocky3
Rocky4
CSeq: 3 REGISTER

Session Initiation Protocol (REGISTER)
temp data here
Jocky1
Jocky2
Jocky3
Jocky4
CSeq: 3 REGISTER

Session Initiation Protocol (REGISTER)
Hello
world
Bye
temp data here
CSeq: 3 REGISTER

This way, each iteration of the look will have a single 'record' that you can process.

Alternatively, you could create an array of records using (something like) split or a repeat regex.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM