简体   繁体   中英

How to make a regex that works in multi line mode also work in single line in C#

Consider the following string fragment:

var someInput = ..... +
"admin-state : up" +
"opr-state/tx-rate-ds : up :32093" +
"cur-op-mode : g993-2-8d" +
"tx-rate-us : 5048" +
"tx-rate-ds : 32093" +
"noise-margin-down : 204" +
"noise-margin-up : 165" +
"actual-tps-tc-mode : ptm" +
"overrule-state : not-created" +
.....;

I am trying to extract the three sections of the line:

"opr-state/tx-rate-ds : up :32093"

I am using regexstorm to try out my expressions. And to get each of the values I came up with these:

@"(?<paramName>opr-.[^\s]*)" // Gets "opr-state/tx-rate-ds"  
@"opr.*:\s*(?<middle>.*(?=:))" // Gets "up"  
@"opr.*:\s*.*:(?<value>[\d]*)" // Gets 32093

The problem is that it works considering each line in the input independently but, I am getting the input as a single string which basically is as if I am running the regex in single line mode on the tester so the results I get in the application are as follows:

@"(?<paramName>opr-.[^\s]*)" // Gets "opr-state/tx-rate-ds"  
@"opr.*:\s*(?<middle>.*(?=:))" // Gets everything from the first ": up" until the last ":" before "not-created"   
@"opr.*:\s*.*:(?<value>[\d]*)" // Gets 32093

So trying to phrase what I want this expression to do would be something like:

In a single string, find whatever is between opr.*:\\s* and the following colon

So far I've tried changing the options on the Match method to run it as Singleline and changing the expression to opr.*:\\s*(?<middle>[^:]) but none of those have worked.

I really suck at regular expressions, please help.

Thank you.

Use non-greedy repetition:

@"opr.*?:\s*(?<middle>.*?(?=:))"

.* tries to match as many characters as possible. .*? will make it match only as little as it needs. And given that you have set clear boundaries ( : ), little is just enough.

See it in action

The problen you're facing is because the regex engine is greedy by default. Any quantifier, such as * , ? , or {n,m} will try to match as much as it can, only backtracking if the rest of the pattern doesn't match. I find this article quite useful to understand the internals: Watch Out for The Greediness! .

Solution:
Use lazy quantifiers adding an extra ? immediately afterwards. Examples:

  • .*?
  • \\s+?
  • [az]{5,}?

These will try to match as less as they can, only consuming more characters when the engine backtracks.

In your case, it works if you modify the expression to opr.*?:\\s*(?<middle>[^:]+)

However , let's try a different approach. In regular expressions, it helps to be as specific as you can. If you look at it from another angle, all you're trying to match in every token are characters except colons ( : ) or, even better, anything except colons and whitespace.

Code:

Regex regex = new Regex(@"(?<paramName>  opr-[^\s:]+  )  # literal `opr-` followed by any chars except whitespace or `:`
                          \s*:\s*                        # separator: literal `:` optionally surrounded by any number of whitespace chars
                          (?<middle>  [^\s:]+  )         # any chars except whitespace or `:`
                          \s*:\s*                        # separator
                          (?<value>  \d+  )              # 1 or more digits (an integer)
                         "
                       , RegexOptions.IgnorePatternWhitespace | RegexOptions.Compiled);

foreach (Match ItemMatch in regex.Matches(someInput))
{
    Console.WriteLine("{0}\t{1}\t{2}", 
                      ItemMatch.Groups["paramName"].Value,
                      ItemMatch.Groups["middle"].Value, 
                      ItemMatch.Groups["value"].Value);
}

*Notice I used RegexOptions.IgnorePatternWhitespace to ignore spaces in the pattern, and to allow the comments.

The [^\\s:]+ is a character class to match all characters, except:

  • \\s whitespace
  • : a literal colon

Using that construct, you don't need to worry about greediness .

Online test: Check the code here

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM