简体   繁体   English

perl-如何根据行的位置从文件中提取行

[英]perl - How to extract lines from a file based on their position

I am processing a text file to extract lines that contain a timestamp and then performing a calculation on those timestamps. 我正在处理一个文本文件以提取包含时间戳的行,然后在这些时间戳上执行计算。 The line contains a timestamp followed by a message which I'm performing a regular expression on to extract. 该行包含一个时间戳,后跟一条消息,我正在执行正则表达式以提取该消息。

TIME | MESSAGE
20:48:27.159 | FOO
20:48:47.353 | BAR
20:48:49.227 | SPAM
20:48:52.192 | FOO

Below is sudo code of the regular expression I'm carrying out on the file 下面是我正在文件上执行的正则表达式的sudo代码

... .... ... 


open (my $FH, "<", $file) or die "Cannot open <$file>: $!";
for my $line (<$FH>) {
    if ($line =~ /bar/) {
        my $ts1 = ExtractTimestamp($line);
    } elsif ($line =~ /FOO/) {
        my $ts2 = ExtractTimestamp($line);
    }
}
my $diff = $ts2 - $ts1;

The problem here is that the regular expression finds the first occurrence of the line and extracts that, which leaves me with negative timestamps. 这里的问题是正则表达式找到该行的第一个匹配项并将其提取出来,这给我留下了负面的时间戳。 I'm wondering are there any modules in perl or any technique where I can extract occrurences of lets say FOO that occur in the file after BAR? 我想知道perl中是否有任何模块或任何技术可以提取出BAR后文件中出现的FOO的出现?

Would appreciate any help here! 希望在这里有任何帮助!

This solution uses the range operator to find the first BAR line followed by the first FOO line after it. 此解决方案使用范围运算符查找第一条BAR行,然后找到其后的第一条FOO行。 The time in the record is pushed onto array @ts if it is either the first or the last line in the range 如果记录中的时间是范围中的第一行或最后一行,则将其推送到数组@ts

use strict;
use warnings;

my @ts;
while ( <DATA> ) {
    next unless my $state = /BAR/ .. /FOO/;
    push @ts, /([\d:.]+)/ if $state == 1 or $state =~ /E/;
}

print join(' ... ', @ts), "\n";

__DATA__
TIME | MESSAGE
20:48:27.159 | FOO
20:48:47.353 | BAR
20:48:49.227 | SPAM
20:48:52.192 | FOO

output 输出

20:48:47.353 ... 20:48:52.192
open (my $FH, "<", $file) or die "Cannot open <$file>: $!";
# define $ts1 and $ts2 OUTSIDE "for" loop
my( $ts1, $ts2);
for my $line (<$FH>) {
    if ($line =~ /bar/) {
        $ts1 = ExtractTimestamp($line);
    } 
    # ignore FOO before first BAR sets $ts1
    elsif ( defined($ts1) and $line =~ /FOO/) { 
        $ts2 = ExtractTimestamp($line);
        # stop searching after first FOO and "BAR after FOO" pair
        last;
    }
}
# if both FOO and "BAR after FOO" has set their variables
if( defined($ts1) and defined($ts2)) {
   my $diff = $ts2 - $ts1;
   ...
 }

There's several ways to do this in perl, depending on precisely what you want to accomplish. 在perl中有多种方法可以完成此操作,具体取决于您要完成的工作。 If I'm reading you right, you're looking at finding both the FOO and BAR timestamps, and presumably trying to extract a delta? 如果我没看错,您正在寻找同时找到FOOBAR时间戳,大概是想提取增量的方法?

Key questions would be - are both FOO and BAR exactly matched? 关键问题是FOOBAR完全匹配?

I mean, you could do it via multi-line regex: 我的意思是,您可以通过多行正则表达式来做到这一点:

#!/usr/bin/env perl

use strict;
use warnings;
use Data::Dumper;

local $/;

my ( $bar, $foo )  =  <DATA> =~ m/^(\d\S+) \| BAR.*?(\d\S+) \| FOO$/ms;
print "BAR: $bar\nFOO: $foo\n";

__DATA__
TIME | MESSAGE
20:48:27.159 | FOO
20:48:47.353 | BAR
20:48:49.227 | SPAM
20:48:52.192 | FOO

This will match the first instance of paired 'BAR' and 'FOO'. 这将匹配配对的“ BAR”和“ FOO”的第一个实例。 (You can capture multiple times if you use the g flag on you regex). (如果在正则表达式上使用g标志,则可以捕获多次)。

Alternatively - you can set the record separator to FOO : 或者,您可以将记录分隔符设置为FOO

#!/usr/bin/env perl

use strict;
use warnings;
use Data::Dumper;

local $/ = "FOO\n"; 

while ( <DATA> ) {

   my ( $foo ) = m/(\S+) \| FOO/;
   my ( $bar ) = m/(\S+) \| BAR/;
   print "$foo $bar\n";

}

__DATA__
TIME | MESSAGE
20:48:27.159 | FOO
20:48:47.353 | BAR
20:48:49.227 | SPAM
20:48:52.192 | FOO

Or what you're doing - iterating line by line: 或您正在做什么-逐行迭代:

#!/usr/bin/env perl

use strict;
use warnings;
use Data::Dumper;

my $last_bar;
while (<DATA>) {

    if (m/^(\d\S+) \| BAR/) {
        $last_bar = $1;
    }
    if ( my ($foo) = m/^(\d\S+) \| FOO/ ) {
        if ($last_bar) {
            print "$foo $last_bar\n";
        }
        else {
            print "Unmatched:\n";
            print;
        }
        $last_bar = undef;
    }
}

__DATA__
TIME | MESSAGE
20:48:27.159 | FOO
20:48:47.353 | BAR
20:48:49.227 | SPAM
20:48:52.192 | FOO

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM