简体   繁体   English

正则表达式不替换最后匹配项

[英]Regex not replacing the last occurrence of a match

I have the following text layout: 我有以下文本布局:

Heading
Chapter 1:1 This is text
2 This is more text
3 This is more text
4 This is more text
5 This is more text
6 This is more text
7 This is more text
8 This is more text
9 This is more text
10 This is more text
11 This is more text
12 This is more text
13 This is more text
14 This is moret text 
15 This is more text
Heading    
Chapter 2:1 This is text
2 This is more text...

and I am trying to add the first Chapter reference and the last one in that Chapter right after the Heading, written in parentheses. 我正试图在标题后的括号中添加该章的第一个参考文献和该章的最后一个参考文献。 Like so: 像这样:

Heading (Chapter 1:1-15)
Chapter 1:1 This is text
2 This is more text
3 This is more text
4 This is more text
5 This is more text
6 This is more text
7 This is more text
8 This is more text
9 This is more text
10 This is more text
11 This is more text
12 This is more text
13 This is more text
14 This is moret text 
15 This is more text

I've come up with this regular expression so far: 到目前为止,我已经提出了这个正则表达式:

~s/(?s)(Heading)\r(^\d*\w+\s*\d+:\d+|\d+:\d+)(.*?)(\d+)(.*?\r)(?=Heading)/\1 (\2-\4)\r\2\3\4\5/g;

but this is grabbing the first number right after Chapter 1:1 (ie "2", "Heading (Chapter 1:1-2)"), instead of the last one ("15" as in "Heading (Chapter 1:1-15)"). 但这是在第1:1章之后抓取第一个数字(即“ 2”,“标题(第1:1-2章)”),而不是最后一个数字(如“标题(第1:1章”)为“ 15”) -15)“)。 Could someone please tell me what's wrong with the regex? 有人可以告诉我正则表达式有什么问题吗? Thank you! 谢谢!

An implementation of @FMc's comment could be something like: @FMc的注释的实现可能类似于:

#!/usr/bin/perl
use warnings;
use strict;

my $buffer = '';
while (<DATA>) {
    if (/^Heading \d+/) { # process previous buffer, and start new buffer
        process_buffer($buffer);
        $buffer = $_;
    }
    else { # add to buffer
        $buffer .= $_;
    }
}
process_buffer($buffer);   # don't forget last buffer's worth...


sub process_buffer {
    my($b) = @_;

    return unless length $b;  # don't bother with an unpopulated buffer

    my($last) = $b =~ /(\d+)\s.*$/;
    my($chap) = $b =~ /^(Chapter \d+:\d+)/m;
    $b =~ s/^(Heading \d+)/$1 ($chap-$last)/;

    print $b;
}

__DATA__
Heading 1
Chapter 1:1 This is text
2 This is more text
3 This is more text
4 This is more text
5 This is more text
6 This is more text
7 This is more text
8 This is more text
9 This is more text
10 This is more text
11 This is more text
12 This is more text
13 This is more text
14 This is moret text
15 This is more text
Heading 2
Chapter 2:1 This is text
2 This is more text...
3 This is more text

Edit for updated question 编辑更新的问题

Here's a regex with explanation that will solve your problem. 这是带有说明的正则表达式,可以解决您的问题。 http://codepad.org/mSIYCw4R http://codepad.org/mSIYCw4R

~s/
((?:^|\n)Heading)   #Capture Heading into group 1.
                    #We can't use lookbehind because of (?:^|\n)
(?=                 #A lookahead, but don't capture.
  \nChapter\s       #Find the Chapter text.
  (\d+:\d+)         #Get the first chapter text. and store in group 2
  .*                #Capture the rest of the Chapter line.
  (?:\n(\d+).+)+    #Capture every chapter line.
                    #The last captured chapter number gets stored into group 3.
)
/$1 (Chapter $2-$3)/gx;

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM