简体   繁体   English

如何仅打印 perl 中所有匹配项的捕获组?

[英]How to print only the capture groups of all matches in perl?

How to print only the capture groups of all matches in perl?如何仅打印 perl 中所有匹配项的捕获组? /g doesn't seem to work. /g似乎不起作用。

I don't think I'm doing any of it correctly, using if statements, this is why am asking.我认为我没有正确地做任何事情,使用if语句,这就是为什么要问。 How is the proper way to do it?正确的方法是如何做到的? (I can't find anything on the Internet that helps and I am struggling for hours to make it finally work.) (我在互联网上找不到任何有帮助的东西,我努力了好几个小时才让它最终起作用。)

$LONG_REGEX_WITH_TWO_CAPTURING_GROUPS="";
$file1="file1.html";

/* This part is complicated, this is why I said nothing 
 * about the two, but here is the result:
 *
 * Basically $2 (a letter) + whitespace + $1 (a filename)
 * a file.txt
 * b anotherfile.txt
 * c 3rdfile.txt
 * d 4thfile.txt
 * 
 * I want it to become>
 * a - (A specific part of the text in file.txt)
 * b - (A specific part of the text in anotherfile.txt)
 * etc.
*/

my $content1 = do { open my $fh, '<', $file1 or die $!; local $/; <$fh>; };

if ( $content1 =~ /$LONG_REGEX_WITH_TWO_CAPTURING_GROUPS/g ) {
    # Print the letter first ($2).
    print "$2 - ";
    # Open the corresponding file (it's name is $1).
    my $content2 = do { open my $fh, '<', $1 or die $!; local $/; <$fh>; };
    # Try to complete the task.
    if ( $content2 =~ /$SECOND_REGEX/g ) {
        print "$1\n"; # There is just one capturing group.
    }
}

However, this only prints the first match, even if it has a global flag.但是,这只会打印第一个匹配项,即使它有一个全局标志。

As in:如:

a - The desired text.

Nevermind the code, the question is very simple: How to print only the content from the capture groups, but from all of the matches (or making it so it matches everything in the file)?不用管代码,问题很简单:如何仅打印捕获组中的内容,但从所有匹配项中打印(或使其与文件中的所有内容匹配)?

Thank you!谢谢!

I'm editing so I can put the code here:我正在编辑,所以我可以把代码放在这里:

#!/usr/bin/perl

$file1="file1.html";
my $content1 = do { open my $fh, '<', $file1 or die $!; local $/; <$fh>; };

foreach ( $content1 =~ m/LONG_REGEX_WITH_TWO_CAPTURING_GROUPS/g ) {
    # If I were to put a print "$content1"; here, the program would have
    # no output. Here is the problem, the question still remains.
    print "$2 - ";
    my $content2 = do { open my $fh, '<', $1 or die $!; local $/; <$fh>; };
    foreach ( $content2 =~ m/SECOND_REGEX>/g ) {
        print "$1\n"; # There is just one capturing group.
    }
}

This worked for me:这对我有用:

#!/usr/bin/perl

$file1="file1.html";
my $content1 = do { open my $fh, '<', $file1 or die $!; local $/; <$fh>; };
while ( $content1 =~ /LONG_REGEX_WITH_TWO_CAPTURING_GROUPS/g ) 
    print "$2 - ";
    my $content2 = do { open my $fh, '<', "../../VT/$1" or die $!; local $/; <$fh>; };
    while ( $content2 =~ /SECOND_REGEX/g ) {
        print "$1\n\n<br/>"; # There is just one capturing group.
    }
}

You want to iterate over the matches, but you have no loop.您想遍历匹配项,但没有循环。 The g modifier makes you an array of all matches and you have to iterate over the array. g 修饰符使您成为所有匹配项的数组,并且您必须遍历该数组。

@matches = ( 'foo' =~ m{o}g );

This would make you an array with two "o" strings in it.这将使您成为一个包含两个"o"字符串的数组。

You can iterate over the matches with code like this:您可以使用如下代码迭代匹配项:

foreach ( 'foo' =~ m{o}g ) { ... }

If You want to iterate and need only the matching group, you must call the matching operator in scalar context.如果您想迭代并且只需要匹配组,则必须在标量上下文中调用匹配运算符。 Here is an example how to do it:这是一个如何做到这一点的例子:

$str="a m7 bcd 9 m2 cde m3";
while ($str =~ m{m(\d)}g) {
   print "$1\n";
}

This prints 7, 2 and 3, each in a line.这将打印 7、2 和 3,每一个都排成一行。

It is not clear to me how exactly that regex is meant to work but here are two possible situations.我不清楚该正则表达式究竟是如何工作的,但这里有两种可能的情况。

It seems that you have a regex that matches multiple (two) subpatterns within one large pattern.似乎您有一个匹配一个大模式中的多个(两个)子模式的正则表达式。 Then you don't need the /g modifier;那么你就不需要/g修饰符了; when the large pattern matches the subpatterns are matched as well (capture as needed).当大模式匹配时,子模式也匹配(根据需要捕获)。 Then you can have the m operator in list context so that it returns these captures, instead of returning true/false, what it does in the scalar context然后,您可以在列表上下文中使用m运算符,以便它返回这些捕获,而不是返回 true/false,它在标量上下文中执行的操作

my $string = q(73 name);

my @matches = $string =~ /([0-9]+) \s* ([a-z]+)/xi;

if (@matches) {
    # it matched, process the two captures
}

This can be done inside the condition of the if statement这可以在if语句的条件内完成

if (my @matches = $string =~ /([0-9]+)\s*([a-z]+)/i) { 
    # getting here only means that there were *some* matches
    # check @matches as suitable, process
}

Now this whole thing is scoped to the if statement;现在整个事情的范围是if语句; there is no @matches variable outside.外面没有@matches变量。

Or, in this case you can simply use the capture variables, like或者,在这种情况下,您可以简单地使用捕获变量,例如

if ( $string =~ /([0-9]+) \s* ([a-z]+)/xi ) {
    # use $1 and $2 (check whether they were both defined)
}

See more on regex operators in perlop , and look over the reference perlre . 在 perlop 中查看更多关于正则表达式运算符的信息,并查看参考perlre


Another possibility is that a regex pattern need be matched multiple times in the string, as the engine goes along the string and parses it.另一种可能性是,一个正则表达式模式需要在字符串中匹配多次,因为引擎会沿着字符串运行并解析它。 For that you indeed need the \g modifier.为此,您确实需要\g修饰符。

It says in "Global matching" in perlretut在 perlretut 的“全局匹配”中说

[...] The modifier /g stands for global matching and allows the matching operator to match within a string as many times as possible. [...] 修饰符/g代表全局匹配,并允许匹配运算符在字符串内尽可能多地匹配。 In scalar context, successive invocations against a string will have /g jump from match to match, keeping track of position in the string as it goes along.在标量上下文中,对字符串的连续调用将使/g从一个匹配跳转到另一个匹配,同时跟踪字符串中的 position。 [...] [...]

Since you need both matches at the same time for processing you need to match in the list context and capture the returned matches into an array, then process that array.由于您需要同时处理两个匹配项,因此您需要在列表上下文中进行匹配并将返回的匹配项捕获到一个数组中,然后处理该数组。 For example例如

my $string = '1 one 2 two';
my @matches = $string =~ /([a-z]+)/gi;  # @matches has elements: ('one', 'two')
# check how many @matches, etc

or perhaps have it inside if like above或者if像上面那样把它放在里面

if (my @matches = $string =~ /([a-z]+/gi) { 
    # check, process...
}

Add suitable checks for what was caught in @matches before processing it.在处理之前为@matches中捕获的内容添加适当的检查。


Comments on the code posted in Q对 Q 中发布的代码的评论

When you have the regex as the condition for an if statement it is in the "scalar" context.当您将正则表达式作为if语句的条件时,它位于“标量”上下文中。 This means, as the quote from the docs above shows, that it will return matches one by one -- if invoked repeatedly.这意味着,正如上面文档中的引用所示,如果重复调用,它将一一返回匹配项。 In your if it runs once, so you only get the first match.在你的if它运行一次,所以你只能得到第一场比赛。 So $2 is undef .所以$2undef

When you have it under foreach (in the Edit), then it is indeed in the list condition -- but foreach obtains a list from the statement (so both matches) and then it iterates through it.当您在foreach下(在编辑中)拥有它时,它确实处于列表条件中——但foreach从语句中获取一个列表(因此两者都匹配),然后迭代它。 So every time through you only have one of the matches on hand.所以每次通过你手头只有一场比赛。 Again no good.又不行了。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM