简体   繁体   English

Perl脚本读取标记之间的内容

[英]perl script to read content between marks

In the perl , how to read the contents between two marks. 在perl中,如何读取两个标记之间的内容。 Source data like this 像这样的源数据

START_HEAD
ddd
END_HEAD

START_DATA
eee|234|ebf
qqq|              |ff
END_DATA

--Generate at 2011:23:34

then I only want to get data between "START_DATA" and "END_DATA". 那么我只想获取“ START_DATA”和“ END_DATA”之间的数据。 How to do this ? 这个怎么做 ?

sub readFile(){ 
    open(FILE, "<datasource.txt") or die "file is not found";

    while(<FILE>){      
        if(/START_DATA/){           
            record(\*FILE);#start record;
        }
    }
}

sub record($){
    my $fileHandle = $_[0];

    while(<fileHandle>){
        print $_."\n";      
        if(/END_DATA/) return ;         
    }
}

I write this code, it doesn't work. 我写这段代码,它不起作用。 do you know why ? 你知道为什么吗 ?

Thanks 谢谢

Thanks 谢谢

You can use the range operator: 您可以使用范围运算符:

perl -ne 'print if /START_DATA/ .. /END_DATA/'

The output will include the *_DATA lines, too, but it should not be so hard to get rid of them. 输出也将包括* _DATA行,但要摆脱它们并不难。

Besides a few typos, your code is not too far off. 除了一些拼写错误之外,您的代码距离还不太远。 Had you used 你曾经用过

use strict;
use warnings;

You might have figured it out yourself. 您可能自己想通了。 Here's what I found: 这是我发现的:

  • Don't use prototypes if you do not need them, or know what they do. 如果您不需要原型或知道它们的用途,请不要使用它们。

Normal sub declaration is sub my_function (prototype) { , but you can leave out the prototype and just use sub my_function { . 普通的子声明是sub my_function (prototype) { ,但是您可以省略原型,而只使用sub my_function {

  • while (<fileHandle>) { is missing the $ sign to denote that it is a variable (scalar) and not a global. while (<fileHandle>) {缺少$符号表示它是变量(标量)而不是全局变量。 Should be $fileHandle . 应该是$fileHandle
  • print $_."\\n"; will add an extra newline. 将添加一个额外的换行符。 Just print; 只是print; will do what you expect. 会做您期望的。
  • if(/END_DATA/) return; is a syntax error. 是语法错误。 Brackets are not optional in perl in this case. 在这种情况下,括号在perl中不是可选的。 Unless you reverse the statement. 除非您撤消声明。

Use either: 使用以下任一方法:

return if (/END_DATA/);

or 要么

if (/END_DATA/) { return }

Below is the cleaned up version. 下面是清理后的版本。 I commented out your open() while testing, so this would be a functional code example. 我在测试时注释掉了open() ,所以这将是一个功能代码示例。

use strict;
use warnings;

readFile();

sub readFile { 
    #open(FILE, "<datasource.txt") or die "file is not found";
    while(<DATA>) {      
        if(/START_DATA/) {
            recordx(\*DATA); #start record;
        }
    }
}

sub recordx {
    my $fileHandle = $_[0];
    while(<$fileHandle>) {
        print;
        if (/END_DATA/) { return }         
    }
}

__DATA__
START_HEAD
ddd
END_HEAD

START_DATA
eee|234|ebf
qqq|              |ff
END_DATA

--Generate at 2011:23:34

This is a pretty simple thing to do with regular expressions, just use the /s or /m (single line or multiple line) flags - /s allows the . 使用正则表达式是一件非常简单的事情,只需使用/ s或/ m(单行或多行)标志-/ s允许使用. operator to match newlines, so you can do /start_data(.+)end_data/is . 运算符以匹配换行符,因此您可以执行/start_data(.+)end_data/is

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM