简体   繁体   中英

Regex - Extract Words within brackets and quotes if it starts with a keyword only

I have the following string:

[The quick] brown fox [mykey*="is a super-fast9"] animal [mykey^="that"] can run "very rapid" and [otherkey="effortlessly"].

I need to extract the words(separated by space) within double quotes that is at the same time within brackets that start with a specific keyword(mykey).

So far I have:

The quick

mykey*="is

a

super-fast9"

mykey^="that"

otherkey="effortlessly"

But I want:

is

a

super-fast9

that

Example Link: https://regex101.com/r/zmNse1/2

The solution offered by Wiktor is the most logical to use, but for sake of RegEx challenge see this Pattern \\[(?!mykey)[^\\[]+|([^\\s\\[=\\"]+)(?=[^\\"]*\\"\\]) , check group #1 Demo

\[                  # "["
(?!                 # Negative Look-Ahead
  mykey             # "mykey"
)                   # End of Negative Look-Ahead
[^\[]               # Character not in [\[] Character Class
+                   # (one or more)(greedy)
|                   # OR
(                   # Capturing Group (1)
  [^\s\[=\"]        # Character not in [\s\[=\"] Character Class
  +                 # (one or more)(greedy)
)                   # End of Capturing Group (1)
(?=                 # Look-Ahead
  [^\"]             # Character not in [\"] Character Class
  *                 # (zero or more)(greedy)
  \"                # """
  \]                # "]"
)                   # End of Look-Ahead

You may match the substrings you need with a relatively simple regex and capture the parts between quotes, and then split the captures with 1 or more whitespace pattern:

var pattern = "\\[mykey[^][=]+=\"([^\"]*)\"]";
var s = "[The quick] brown fox [mykey*=\"is a  super-fast9\"] animal [mykey^=\"that\"] can run \"very rapid\".";
var result = Regex.Matches(s, pattern)
    .Cast<Match>()
    .SelectMany(v => v.Groups[1].Value.Trim().Split(new[] {" "}, StringSplitOptions.RemoveEmptyEntries))
    .ToList();
Console.WriteLine(string.Join("\n", result));

See the C# demo .

The pattern is

\[mykey[^][=]+="([^"]*)"]

See the regex demo .

Pattern details

  • \\[ - a literal [
  • mykey - a literal substring
  • [^][=]+ - 1 or more chars other than [ , ] and =
  • = - an equal sign
  • " - a double quote
  • ([^"]*) - Group 1: any 0+ chars other than "
  • "] - a literal "] substring.

Note that the captured value is trimmed from leading/trailing whitespace first (with .Trim() ) to avoid empty values in the result. @"\\s+" matches 1 or more whitespace chars. The .Split(new[] {" "}, StringSplitOptions.RemoveEmptyEntries) splits Group 1 value with whitespaces.

This regex should do what you want :
(?<=\\[mykey.?="[^]]*)[\\w-]+(?=[^]]*"\\])

Demo here

I assumed there cannot be nested brackets. Also I didn't know what to do with the ^ or * between mykey and the = , so I allowed an optional wildcard.
You might need to escape the backslashes in your code.

For what it's worth: Since others mentioned String Parsing, I thought I'd give one implementation of that here. String parsing options are always longer-winded, but are orders of magnitude faster than Regular Expressions. As a guy who uses Regex a LOT, I can still say that I prefer string functions where possible. The only complications with this answer are that you have to know what your assignment operators are, and you can't have Escaped Double-Quotes in your String Value. I wrote it fairly verbose, though you could cut out some conditionals or shorten some lines if you wanted less bytes of code.

List<string> GetValuesByKeyword(string keyword, string input)
{
    var vals = new List<string>();
    int startIndex = input.IndexOf("[");
    while (startIndex >= 0)
    {
        var newValue = "";
        if (startIndex >= 0 && startIndex < input.Length - 1)
        {
            var squareKey = input.Substring(startIndex + 1).Trim();
            if (squareKey.StartsWith(keyword))
            {
                var squareAssign = squareKey.Substring(keyword.Length).Trim();
                var assignOp = StartsWithWhich(squareAssign, "=", "+=", "-=", "*=", "/=", "^=", "%=");
                if (!string.IsNullOrWhiteSpace(assignOp))
                {
                    var quotedVal = squareAssign.Substring(assignOp.Length).Trim();
                    if (quotedVal.StartsWith("\""))
                    {
                        var endQuoteIndex = quotedVal.IndexOf('"', 1);
                        if (endQuoteIndex > 0)
                        {
                            newValue = quotedVal.Substring(1, endQuoteIndex - 1);
                        }
                    }
                }
            }
        }
        if (!string.IsNullOrWhiteSpace(newValue))
        {
            vals.Add(newValue);
            startIndex = input.IndexOf("[", input.IndexOf(newValue, startIndex) + newValue.Length);
        }
        else startIndex = input.IndexOf("[", startIndex + 1);
    }
    return string.Join(" ", vals).Split(new char[] { ' ' }, StringSplitOptions.RemoveEmptyEntries).ToList();
}

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM