Regex to match capture group multiple times within subsection of text document

Question

I am passing an XML document, as text document, though a regular expression process.

<YaddaYaddaPrecedingMarkup>includes (a) and (b) and (c) and (d) and ...

<MyElement>SECTIONBEGINS (a) Item A (b) Item B (c) Item C (d) Item D</MyElement>

<YaddaYaddaFollowingMarkup>includes (a) and (b) and (c) and (d) and ...

I want my regular expression to capture the bullet labels '(a)' '(b)' '(c)' '(d)' . ( ..etc...) which appear within 'MyElement', whose text begins with "SECTIONBEGINS".

I need this regular expression to ignore any other instances of (a) ... (b) ... (c) appearing elsewhere within my XML-as-text.

If I use:

(\([a-z]\))

I match (a), (b), (c) throughout the document. That expression is too unrestricted.

If I use:

>SECTIONBEGINS(?:.*?)(\([a-z]\))(?:.*)<

I successfully match only within the correct section but I match only '(a)' (the first hit), and not the (b), (c), (d) of that same section.

And I've tried so many other variations, some of which will select the '(d)' instead but none seem to capture more than one hit.

Answer 1

Variant 1: Lookbehind

(?<=SECTIONBEGINS[^>]*)\([a-z]\)

Variant 2: \\G anchor + capturing group

(?:SECTIONBEGINS|\G)[^<(]*(\([a-z]\))

Answer 2

You need to look into the Match.Group.Captures :

Regex.Match(xml, @">SECTIONBEGINS (?<items>\([a-z]\) .+?)+<")
    .Groups["items"].Captures.Cast<Capture>()
    .Select(x => x.Value)

Or, if you like to group them into key/value pair:

var match = Regex.Match(xml, @">SECTIONBEGINS( (\((?<index>[a-z])\) (?<item>.+?)))+<");
Enumerable.Zip(
    match.Groups["index"].Captures.Cast<Capture>(),
    match.Groups["item"].Captures.Cast<Capture>(),
    Tuple.Create)
    .ToDictionary(x => x.Item1.Value, x => x.Item2.Value)

EDIT: If you don't care about the bullet labels, you can extract the items through:

Regex.Match(xml, @">SECTIONBEGINS( (\((?<index>[a-z])\) (?<item>.+?)))+<")
    .Groups["item"].Captures.Cast<Capture>()
    .Select(x => x.Value)

Or, if you want to replace the content in place:

Regex.Replace(xml, @">SECTIONBEGINS( (\((?<index>[a-z])\) (?<item>.+?)))+<",
    m => string.Format(">SECTIONBEGINS {0}<", string.Join(" ", m.Groups["item"]
        .Captures.Cast<Capture>()
        .Select((x,i) => string.Format("({0}) {1}",
            (char)(((int)'a')+i),
            x.Value.ToUpper() // TODO: your replace logic here
    ))))
)

Regex to match capture group multiple times within subsection of text document

Question

2 answers

solution1
2 ACCPTED 2017-09-11 19:34:49

solution2
1 2017-09-11 18:54:13

Regex to match capture group multiple times within subsection of text document

Question

2 answers

solution1 2 ACCPTED 2017-09-11 19:34:49

solution2 1 2017-09-11 18:54:13

solution1
2 ACCPTED 2017-09-11 19:34:49

solution2
1 2017-09-11 18:54:13