简体   繁体   English

如何对Perl钻石操作员返回的数据进行多行匹配

[英]How can I do a multi-line match on the data returned from Perl's diamond operator

Is there some trick to do multi-line regular expression matches with <> , and loop over them? 是否有一些技巧可以与<>进行多行正则表达式匹配,并循环它们? This example results in no matches when run on files with \\n as the newline separator: 当使用\\n作为换行符分隔符的文件上运行时,此示例不会导致匹配:

while (<> =~ m/\n./) {
  print($.);
}

I need to know the line of the start of the match inside the while loop, as in the example. 我需要知道while循环中匹配开始的行,如示例中所示。

The goal is to find all lines which have less than 75 characters which are followed by a line starting with a space (the standard vCard way of splitting long lines): 目标是找到所有小于75个字符的行,后面跟一个以空格开头的行(标准的vCard分割长行的方式):

while (<> =~ m/(^|\n).{0,74}\n /)

What are you tring to do in that regex? 那个正则表达式你要做什么? It looks like you are trying to find any case where a newline is followed by at least one character, and then that leads you to print the line number ( $. ) of whatever matches that criterion. 看起来你试图找到一个换行符后面跟着至少一个字符的任何情况,然后这会导致你打印任何符合该标准的行号( $. )。

If you don't mind my asking, what's the larger purpose here? 如果你不介意我的问题,那么这个更大的目的是什么?

In any case, see this article for a clear discussion of multiline matching: Regexp Power 在任何情况下,请参阅此文章以清楚地讨论多行匹配: Regexp Power

Edited after the move to SO : If what you really want is to find the lines with less than 75 characters and a next line beginning with a space, I wouldn't use one regex. 移动到SO后编辑 :如果您真正想要的是找到少于75个字符的行下一行以空格开头,我就不会使用一个正则表达式。 The description points to an easier and clearer (I think) solution: (1) filter out all lines with less than 75 characters (the length function is good for that). 该描述指出了一种更容易和更清晰(我认为)的解决方案:(1)过滤掉少于75个字符的所有行( length函数对此有利)。 For the lines that remain, (2) check if the next line starts with a space. 对于剩余的行,(2)检查下一行是否以空格开头。 That gives you clear logic and an easy regex to write. 这为您提供了清晰的逻辑和易于编写的正则表达式。

In response to the question about getting the "next" line. 回答关于获得“下一行”的问题。 Think of it the other way around: you want to check every next line, but only if the previous line was less than 75 characters. 反过来想想:你想检查下一行,但一行只少于75个字符。 So how about this: 那怎么样:

my $prev = <>; # Initialize $prev with the first line

while (<>) {
    # Add 1 to 75 for newline or chomp it perhaps?
    if (length $prev < 76) {
        print "$.: $_" if $_ =~ m/^\s/;
    }
    $prev = $_;
}

(Note that I don't know anything about vCard format and that \\s is broader than literally "a single space." So you may need to adjust that code to fit your problem better.) (请注意,我不知道vCard格式的任何事情,即\\s是不是一板一眼的更广泛的“一个单一的空间。”所以,你可能需要调整代码以更好地适应你的问题。)

Did you remember to put the handle in multi-line mode by setting $/ to the empty string or the undefined value? 您是否记得通过将$/设置$/空字符串或未定义的值来将句柄置于多行模式?

The following program does what you want: 以下程序可以满足您的需求:

#! /usr/bin/perl

use warnings;
use strict;

$/ = "";

*ARGV = *DATA;

while (<>) {
  while (/^(.{0,75}\n(^[ \t].{1,75}\n)*)/mg) {
    my $vcard = $1;

    $vcard =~ s/\r?\n[ \t]//g;

    print $vcard;
  }
}

__DATA__
DESCRIPTION:This is a long description that exists on a long line.
DESCRIPTION:This is a long description
  that exists on a long line.
DESCRIPTION:This is a long descrip
 tion that exists o
 n a long line.

Output: 输出:

$ ./try
DESCRIPTION:This is a long description that exists on a long line.
DESCRIPTION:This is a long description that exists on a long line.
DESCRIPTION:This is a long description that exists on a long line.

Do you have a file with arbitrary text mixed with vCards? 你有一个文件与vCards混合的任意文字?

If all you have is a bunch of vCards in file and you want to parse them, there some vCard parsing modules on CPAN . 如果您拥有的只是文件中的一堆vCard并且您想要解析它们,那么CPAN上会有一些vCard解析模块

See, for example, Text::vCard , specifically Text::vCard::Addressbook . 例如,参见Text :: vCard ,特别是Text :: vCard :: Addressbook

Regarding, 关于,

while (<> =~ m/\n./) {
  print($.);
}

This would indeed not match anything because of the simple fact that input is read line-by-line meaning there cannot be anything in $_ after the newline. 这确实不匹配任何东西,因为输入是逐行读取的简单事实意味着在换行符后$_不能有任何内容。

If there never be more than single continuation line following each line shorter than 76 characters, the following might fulfill the requirements: 如果每行短于76个字符后永远不会超过单个延续行,则以下内容可能满足要求:

#!/usr/bin/perl

use strict; use warnings;

for 
( 
    my $this = <>, my $next = <>;
    defined ($next = <>);
    close ARGV if eof
) 
{
    printf "%s : %d\n", $ARGV, $. - 1 if 76 > length $this and $next =~ /^ /;
}

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM