简体   繁体   中英

Parse string in C# using Regex

Need to parse:

/subscriptions/1234/resourceGroups/5678/providers/BlaBlaBla/workspaces/BluBluBlu

and extract the variables:

  • 1234
  • 5678
  • BlaBlaBla
  • BluBluBlu

How can I do it in a clean way using C# and regular expressions?

单行代码

var bits = noodly.Split('/');

If still need regex and if positional, like every other one, you could use a Capture Collection with this regex.

^(?:/[^/]*/([^/]*))+

The items are in group 1's capture collection.

This is not intended as answer, but to future readers (I got bored)

Regex

return Regex.Matches(input, @"^(?:/[^/]*/([^/]*))+")[0]
            .Groups[1]
            .Captures.Cast<Capture>()
            .Select(m => m.Value)
            .ToArray();

regexCompiled

private static readonly Regex regex = new Regex(@"^(?:/[^/]*/([^/]*))+", RegexOptions.Compiled);
...

return regex.Matches(input)[0]
            .Groups[1]
            .Captures.Cast<Capture>()
            .Select(m => m.Value)
            .ToArray();

Split

return input.Split(new []{'/'}, StringSplitOptions.RemoveEmptyEntries)
            .Skip(1)
            .Where((x, i) => i % 2 == 0)
            .ToArray();

Unsafe

var list = new List<string>();
var result = string.Empty;

fixed (char* pInput = input)
{
   var plen = pInput + input.Length;
   var toggle = true;

   for (var p = pInput; p < plen; p++)
   {
      if (*p == '/')
      {       
         if (result.Length > 0)
            list.Add(result);
         toggle = !toggle;
         result = string.Empty;
         continue;
      }
      if (toggle)
         result += *p;
   }
}
list.Add(result);
return list.ToArray();

Benchmarks

----------------------------------------------------------------------------
Mode             : Release (64Bit)
Test Framework   : .NET Framework 4.7.1 (CLR 4.0.30319.42000)
----------------------------------------------------------------------------
Operating System : Microsoft Windows 10 Pro
Version          : 10.0.17134
----------------------------------------------------------------------------
CPU Name         : Intel(R) Core(TM) i7-3770K CPU @ 3.50GHz
Description      : Intel64 Family 6 Model 58 Stepping 9
Cores (Threads)  : 4 (8)      : Architecture  : x64
Clock Speed      : 3901 MHz   : Bus Speed     : 100 MHz
L2Cache          : 1 MB       : L3Cache       : 8 MB
----------------------------------------------------------------------------

Results

--- Random characters -------------------------------------------------------
| Value         |  Average |  Fastest |   Cycles | Garbage | Test |    Gain |
--- Scale 1 -------------------------------------------------- Time 0.152 ---
| unsafe        | 2.131 µs | 1.461 µs | 10.567 K | 0.000 B | Pass | 78.42 % |
| split         | 3.874 µs | 2.922 µs | 16.804 K | 0.000 B | Pass | 60.76 % |
| regexCompiled | 7.313 µs | 5.845 µs | 29.310 K | 0.000 B | Pass | 25.93 % |
| regex         | 9.873 µs | 7.891 µs | 37.800 K | 0.000 B | Base |  0.00 % |
-----------------------------------------------------------------------------

Summary

Tested a massive 1,000,000 times each on diffrent string combinations that reflect the original pattern.

Unsafe is jsut ridiculous and should not be used, Regex is neat and tidy, split is not too unreadable either. As epected split is faster.

However, regex is not as slow as i thought it would be. In the end, its comes down to persoanl perfence and your code reviewer.

Update

As sln rightly mentioned in a comment, regex should be compiled for this to be a good benchmark. Note i left out the suggest of leaving out the .Groups[1].Captures.Cast<Capture>().Select(m => m.Value).ToArray(); basically just to leave the result an array of string to keep them all the same.

The compilation to IL gives regex a good performance boost.

Disclaimer , i have nothing against regex and use it all the time

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM