简体   繁体   中英

Regex not replacing the last occurrence of a match

I have the following text layout:

Heading
Chapter 1:1 This is text
2 This is more text
3 This is more text
4 This is more text
5 This is more text
6 This is more text
7 This is more text
8 This is more text
9 This is more text
10 This is more text
11 This is more text
12 This is more text
13 This is more text
14 This is moret text 
15 This is more text
Heading    
Chapter 2:1 This is text
2 This is more text...

and I am trying to add the first Chapter reference and the last one in that Chapter right after the Heading, written in parentheses. Like so:

Heading (Chapter 1:1-15)
Chapter 1:1 This is text
2 This is more text
3 This is more text
4 This is more text
5 This is more text
6 This is more text
7 This is more text
8 This is more text
9 This is more text
10 This is more text
11 This is more text
12 This is more text
13 This is more text
14 This is moret text 
15 This is more text

I've come up with this regular expression so far:

~s/(?s)(Heading)\r(^\d*\w+\s*\d+:\d+|\d+:\d+)(.*?)(\d+)(.*?\r)(?=Heading)/\1 (\2-\4)\r\2\3\4\5/g;

but this is grabbing the first number right after Chapter 1:1 (ie "2", "Heading (Chapter 1:1-2)"), instead of the last one ("15" as in "Heading (Chapter 1:1-15)"). Could someone please tell me what's wrong with the regex? Thank you!

An implementation of @FMc's comment could be something like:

#!/usr/bin/perl
use warnings;
use strict;

my $buffer = '';
while (<DATA>) {
    if (/^Heading \d+/) { # process previous buffer, and start new buffer
        process_buffer($buffer);
        $buffer = $_;
    }
    else { # add to buffer
        $buffer .= $_;
    }
}
process_buffer($buffer);   # don't forget last buffer's worth...


sub process_buffer {
    my($b) = @_;

    return unless length $b;  # don't bother with an unpopulated buffer

    my($last) = $b =~ /(\d+)\s.*$/;
    my($chap) = $b =~ /^(Chapter \d+:\d+)/m;
    $b =~ s/^(Heading \d+)/$1 ($chap-$last)/;

    print $b;
}

__DATA__
Heading 1
Chapter 1:1 This is text
2 This is more text
3 This is more text
4 This is more text
5 This is more text
6 This is more text
7 This is more text
8 This is more text
9 This is more text
10 This is more text
11 This is more text
12 This is more text
13 This is more text
14 This is moret text
15 This is more text
Heading 2
Chapter 2:1 This is text
2 This is more text...
3 This is more text

Edit for updated question

Here's a regex with explanation that will solve your problem. http://codepad.org/mSIYCw4R

~s/
((?:^|\n)Heading)   #Capture Heading into group 1.
                    #We can't use lookbehind because of (?:^|\n)
(?=                 #A lookahead, but don't capture.
  \nChapter\s       #Find the Chapter text.
  (\d+:\d+)         #Get the first chapter text. and store in group 2
  .*                #Capture the rest of the Chapter line.
  (?:\n(\d+).+)+    #Capture every chapter line.
                    #The last captured chapter number gets stored into group 3.
)
/$1 (Chapter $2-$3)/gx;

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM