简体   繁体   中英

Regex to create two capture groups, where the second captures multiple times

My test string is

thread_id=1152236, geo_locality.nomv="Seattle|||San Francisco|||Chicago", user_reference_count=0

Is it possible to have one regex to have two capture groups, where the second capture group will capture multiple times?

I want the first capture group to capture geo_locality (without hardcoding) and the second capture group to capture Seattle , San Francisco , and Chicago .

The closest I got was

(?<key>\w+)\.nomv="(?<val>.+?)(?=\|\|\||")

https://regex101.com/r/wmxg4x/1

Except the val capturing group also needs to capture the other cities.

The answer depends on the regex flavor at play.

  1. Using \\G to continue at the end of the previous match with a PCRE

     (?<key>\\w+)\\.nomv="|(?!^)(?<=\\G)(?<val>.+?)(?:\\|\\|\\||") 

    Demo

    The \\G anchor can be a bit arcane and truly magical at the same time.

Explanation:

  • (?<key>\\w+)\\.nomv="| the key and literal in the 1st alternation act as a start anchor
  • The \\G anchor asserts the position at the end of the previous match or the start of the string for the first match.

    • to exclude the start of the string I've added (?!^) to prevent matches before nom=" ).
    • (?<=\\G) so, we can only continue if there was a match before
    • (?<val>.+?) captures each city block as required
    • (?:\\|\\|\\||") the non-capturing group is simply used to move the cursor forward

  1. Using Captures with .NET

     (?<_KEY_1>\\w+)\\.nomv="(?:(?<_VAL_1>.+?)(?:\\|\\|\\||"))* 

    Demo

    This is not a real challenge for .NET. Just add a group around and a quantifier, and let (?<val>) match multiple times. Then, get the values from the Captures .

在此输入图像描述

You could use an alternation with if supported a positive lookbehind (?<=

(?<_KEY_1>\\w+)(?=\\.nomv=")|(?<_VAL_1>(?<=\\.nomv=")[A-Za-z ]+|(?<=\\|\\|\\|)[A-Za-z ]+)

Explanation

  • (?<_KEY_1> Named capture group
    • \\w+ Match one or more times a word character
  • ) Close named capture group
  • (?=\\.nomv=") Positive lookahead that asserts that what follows is .nomv="
  • | Or
  • (?<_VAL_1> Named capture group
    • (?<=\\.nomv=") Positive lookbehind that assert that what is on the left is .nomv=
    • [A-Za-z ]+ Match an uppercase, lowercase or whitespace
    • | Or
    • (?<=\\|\\|\\|) Positive lookbehind that assert that what is on the left is |||
    • [A-Za-z ]+ Match an uppercase, lowercase or whitespace
  • ) Close named capture group

Try this pattern: (?<_KEY_1>\\w+)\\.nomv="(?<_VAL_1>(.+?\\|\\|\\|)+.+)" .

One thing to note is that inside capturing group _VAL_1 there can be some more capturing groups, but _VAL_1 is what you need.

See DEMO.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM