简体   繁体   English

从perl中的特定行开始读取大文件

[英]Start reading a large file from a certain line in perl

  1. I have to match 2 patterns (pattern_1 and pattern_2) 我必须匹配2个模式(模式_1和模式_2)
  2. Data to match for pattern_2 depends upon pattern_1 (pattern_2 uses some data extracted out of pattern_1) 匹配模式_2的数据取决于模式_1(模式_2使用从模式_1中提取的一些数据)
  3. pattern_2 always occurs after pattern_1 pattern_2总是在pattern_1之后发生
  4. once done matching pattern_2 i need to move back to the place where pattern_1 was matched and start again 完成匹配pattern_2的操作后,我需要移回匹配pattern_1的位置,然后重新开始

I have following code: 我有以下代码:

open(DATA_IN, "<$in_file") or die "Couldn't open file $in_file, $!";
open(DATA_OUT, ">$out_file") or die "Couldn't open file $out_file, $!";
while(<DATA_IN>){
    if($_ =~ /pattern_1/){
        #extract some data
        open(DATA_TEMP, "<$in_file") or die "Couldn't open file $in_file, $!";
        TEMP: while(<DATA_TEMP>){
            if($_ =~ /pattern_2/){
                my $i = 0;
                my $line;
                while ($i<4){
                    $line = <DATA_TEMP>;
                    $i++;
                }
                print $line; #print the data 4 lines after the matched pattern_2
                last TEMP;
            }
        }
    }
}

It works fine, but the issue is that it loads $in_file everytime for pattern_1 match from the start which takes a long time. 它工作正常,但问题是,每次从一开始就为pattern_1匹配每次都加载$ in_file,这需要很长时间。 Can you suggest me a way to load $in_file only from pattern_1 onwards? 您能否建议我仅从pattern_1开始加载$ in_file的方法?

You can use the seek() and tell() methods to move around in the file. 您可以使用seek()tell()方法在文件中四处移动。 Something like the following: 类似于以下内容:

open(DATA_IN, "<$in_file") or die "Couldn't open file $in_file, $!";
open(DATA_OUT, ">$out_file") or die "Couldn't open file $out_file, $!";
while(<DATA_IN>){
    if($_ =~ /pattern_1/){
        # Save the current position
        my $saved_position = tell(DATA_IN);

        # extract some data
        TEMP: while(<DATA_IN>){
            if($_ =~ /pattern_2/){
                my $i = 0;
                my $line;
                while ($i<4){
                    $line = <DATA_IN>;
                    $i++;
                }
                print $line; #print the data 4 lines after the matched pattern_2
                last TEMP;
            }
        }

        # Restore the saved position
        seek(DATA_IN, saved_position, 0);
    }
}

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM