简体   繁体   中英

multiple regex matches for repeating blocks of multiple lines

update 0

I added the code with for (... match in) below.

update 0

My source text repeats approximately every 40 lines. Below I show 8 lines for each of 2 repetitions below. The full data set is here . I need from the "[Board]" line the one or two digits between the quotes; from the "[Dealer]" line I need the single letter between the quotes.

[Board "1"]
[Dealer "N"]
[Vulnerable "None"]
[Deal "N:Q952.652.KJT4.95 T.KQT84.A865.J73 K8763.A7.Q.KQT84 AJ4.J93.9732.A62"]
[Scoring ""]
[Declarer ""]
[Contract ""]

[Board "2"]
[Dealer "E"]
[Vulnerable "NS"]
[Deal "E:K8542.3.4.AT7532 J76.K7.AT85.KQJ8 QT3.AJ84.KJ963.4 A9.QT9652.Q72.96"]
[Scoring ""]
[Declarer ""]
[Contract ""]

The following regex sort of works, but only picks up one match, not the 30+ matches in my text.

NSString *toMatch = @"\\[Board \"([0-9][0-9]?)\"\\].*\\[Dealer \"([NEWS])\"\\]";
NSRegularExpression *regex = [NSRegularExpression  regularExpressionWithPattern:toMatch options:NSRegularExpressionDotMatchesLineSeparators error:&error];
for (NSTextCheckingResult* match in [regex matchesInString:string options:NSRegularExpressionDotMatchesLineSeparators range:NSMakeRange(0, [string length])])
    {
        NSLog(@"Number of ranges in match: %u", match.numberOfRanges);
        for (NSUInteger i = 0; i < match.numberOfRanges; ++i)
        {
            NSRange matchedRange = [match rangeAtIndex: i];
            NSString* tstring = [string substringWithRange: matchedRange];
            NSLog(@"range %lu string: %@", (unsigned long)i, tstring);
        }
    }

I suspect the problem is in the linefeeds, but I don't know how to fix it and the options . This is a continuation of this question .

How do I fix the regex pattern to get the multiple matches?

(In addition, I need the following on the "[Deal]" line, but let's ignore that for now. I need four separate groups, the first after the ":" and before the space, the second and third are between spaces, and the last is everything after the last space and before the quote.)

I could be wrong, but I think the problem in your pattern is that you have .* and you have chosen NSRegularExpressionDotMatchesLineSeparators , so that .* will match everything until it reaches the last occurrence of [Dealer in the source text.

You can turn the .* into a “non-greedy” version by using .*? , alternatively you could avoid having the .* at all and replace it \\\\n (assuming your input is delimited with a single \\n ). Note that in order for the regular expression compiler to see \\ and n (which is the recognised escape sequence to match a linefeed character), you have to escape the \\ in the NSString, so you have to use \\\\n , ie:

NSString *toMatch = "\\[Board \"([0-9][0-9]?)\"\\]\\n\\[Dealer \"([NEWS])\"\\]";

If your source text has Windows line endings you could use \\\\r\\\\n instead, etc.

Try this (unescaped) pattern:

\[(\w+)\s+\"([^\"]*)\"\]

The first group is the tag name, and the second matched group is the contents between quotes whatever it is. You might be capable of capturing these values with \\1 and \\2 respectively in the code.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM