使用perl中的regex匹配上次出现的事件

Question

I have a text like this: 我有这样的文字：

hello world /* select a from table_b
*/ some other text with new line cha
racter and there are some blocks of 
/* any string */ select this part on
ly 
////RESULT rest string

The text is multilined and I need to extract from last occurrence of "*/" until "////RESULT". 文本是多行的，我需要从最后一次出现的“* /”中提取，直到“//// RESULT”。 In this case, the result should be: 在这种情况下，结果应该是：

 select this part on
ly

How to achieve this in perl? 如何在perl中实现这一目标？

I have attempted \\\\\\*/(.|\\n)*////RESULT but that will start from first "*/" 我\\\\\\*/(.|\\n)*////RESULT但是从第一个“* /”开始

Answer 1

A useful trick in cases like this is to prefix the regexp with the greedy pattern .* , which will try to match as many characters as possible before the rest of the pattern matches. 在这种情况下，一个有用的技巧是在regexp前加上贪婪模式.* ，它会在模式的其余部分匹配之前尝试匹配尽可能多的字符。 So: 所以：

my ($match) = ($string =~ m!^.*\*/(.*?)////RESULT!s);

Let's break this pattern into its components: 让我们将这种模式分解为其组成部分：

^.* starts at the beginning of the string and matches as many characters as it can. ^.*从字符串的开头开始，并尽可能多地匹配字符。 (The s modifier allows . to match even newlines.) The beginning-of-string anchor ^ is not strictly necessary, but it ensures that the regexp engine won't waste too much time backtracking if the match fails. （ s修饰符允许.甚至匹配换行符。）字符串开头的锚点^不是绝对必要的，但它确保正则表达式引擎在匹配失败时不会浪费太多时间回溯。
\\*/ just matches the literal string */ . \\*/只匹配文字字符串*/ 。
(.*?) matches and captures any number of characters; (.*?)匹配并捕获任意数量的字符; the ? 的? makes it ungreedy, so it prefers to match as few characters as possible in case there's more than one position where the rest of the regexp can match. 使它不合适，所以它更喜欢匹配尽可能少的字符，以防有一个以上的位置，其余的正则表达式可以匹配。
Finally, ////RESULT just matches itself. 最后， ////RESULT只是匹配自己。

Since the pattern contains a lot of slashes, and since I wanted to avoid leaning toothpick syndrome , I decided to use alternative regexp delimiters. 由于该模式包含很多斜线，并且因为我想避免倾斜牙签综合症，所以我决定使用替代的regexp分隔符。 Exclamation points ( ! ) are a popular choice, since they don't collide with any normal regexp syntax. 感叹号（ ! ）是一种流行的选择，因为它们不会与任何正常的正则表达式语法冲突。

Edit: Per discussion with ikegami below, I guess I should note that, if you want to use this regexp as a sub-pattern in a longer regexp, and if you want to guarantee that the string matched by (.*?) will never contain ////RESULT , then you should wrap those parts of the regexp in an independent (?>) subexpression , like this: 编辑：下面与ikegami的讨论，我想我应该注意，如果你想在更长的正则表达式中使用这个正则表达式作为子模式，并且如果你想保证匹配的字符串(.*?) 永远不会包含////RESULT ，那么你应该将regexp的那些部分包装在一个独立的(?>)子表达式中，如下所示：

my $regexp = qr!\*/(?>(.*?)////RESULT)!s;
...
my $match = ($string =~ /^.*$regexp$some_other_regexp/s);

The (?>) causes the pattern inside it to fail rather than accepting a suboptimal match (ie one that extends beyond the first substring matching ////RESULT ) even if that means that the rest of the regexp will fail to match. (?>)导致其中的模式失败而不是接受次优匹配（即超出匹配////RESULT的第一个子串的匹配），即使这意味着正则表达式的其余部分将无法匹配。

Answer 2

(?:(?!STRING).)*

matches any number of characters that don't contain STRING . 匹配任意数量的不包含STRING的字符。 It's like [^a] , but for strings instead of characters. 它就像[^a] ，但是对于字符串而不是字符。

You can take shortcuts if you know certain inputs won't be encountered (like Kenosis and Ilmari Karonen did), but this is what what matches what you specified: 如果您知道不会遇到某些输入（如Kenosis和Ilmari Karonen所做的那样），您可以使用快捷方式，但这与您指定的内容相符：

my ($segment) = $string =~ m{
    \*/
    ( (?: (?! \*/ ). )* )
    ////RESULT
    (?: (?! \*/ ). )*
    \z
}xs;

If you don't care if */ appears after ////RESULT , the following is the safest: 如果您不关心*/ ////RESULT之后是否出现*/ ，则以下是最安全的：

my ($segment) = $string =~ m{
    \*/
    ( (?: (?! \*/ ). )* )
    ////RESULT
}xs;

You didn't specify what should happen if there are two ////RESULT that follow the last */ . 如果有两个////RESULT跟随最后一个*/ ，则没有指定会发生什么。 The above matches until the last one. 以上匹配直到最后一个。 If you wanted to match until the first one, you'd use 如果你想匹配到第一个，你可以使用

my ($segment) = $string =~ m{
    \*/
    ( (?: (?! \*/ | ////RESULT ). )* )
    ////RESULT
}xs;

Answer 3

Here's one option: 这是一个选项：

use strict;
use warnings;

my $string = <<'END';
hello world /* select a from table_b
*/ some other text with new line cha
racter and there are some blocks of 
/* any string */ select this part on
ly 
////RESULT
END

my ($segment) = $string =~ m!\*/([^/]+)////RESULT$!s;

print $segment;

Output: 输出：

 select this part on
ly

使用perl中的regex匹配上次出现的事件

问题描述

3 个解决方案

解决方案1
18 已采纳 2013-01-02 19:00:13

解决方案2
4 2013-01-02 18:57:52

解决方案3
2 2013-01-02 18:44:05

使用perl中的regex匹配上次出现的事件

问题描述

3 个解决方案

解决方案1 18 已采纳 2013-01-02 19:00:13

解决方案2 4 2013-01-02 18:57:52

解决方案3 2 2013-01-02 18:44:05

解决方案1
18 已采纳 2013-01-02 19:00:13

解决方案2
4 2013-01-02 18:57:52

解决方案3
2 2013-01-02 18:44:05