简体   繁体   English

使用Regex解析C#中的字符串

[英]Parse string in C# using Regex

Need to parse: 需要解析:

/subscriptions/1234/resourceGroups/5678/providers/BlaBlaBla/workspaces/BluBluBlu

and extract the variables: 并提取变量:

  • 1234 1234
  • 5678 5678
  • BlaBlaBla 布拉布拉
  • BluBluBlu 蓝光蓝光

How can I do it in a clean way using C# and regular expressions? 如何使用C#和正则表达式以一种简洁的方式做到这一点?

单行代码

var bits = noodly.Split('/');

If still need regex and if positional, like every other one, you could use a Capture Collection with this regex. 如果仍然需要正则表达式,并且与其他位置一样,也可以使用正则表达式,则可以使用Capture Collection

^(?:/[^/]*/([^/]*))+

The items are in group 1's capture collection. 这些项目在组1的捕获集合中。

This is not intended as answer, but to future readers (I got bored) 这不是要作为答案,而是给将来的读者(我很无聊)

Regex 正则表达式

return Regex.Matches(input, @"^(?:/[^/]*/([^/]*))+")[0]
            .Groups[1]
            .Captures.Cast<Capture>()
            .Select(m => m.Value)
            .ToArray();

regexCompiled 正则表达式编译

private static readonly Regex regex = new Regex(@"^(?:/[^/]*/([^/]*))+", RegexOptions.Compiled);
...

return regex.Matches(input)[0]
            .Groups[1]
            .Captures.Cast<Capture>()
            .Select(m => m.Value)
            .ToArray();

Split 分裂

return input.Split(new []{'/'}, StringSplitOptions.RemoveEmptyEntries)
            .Skip(1)
            .Where((x, i) => i % 2 == 0)
            .ToArray();

Unsafe 不安全

var list = new List<string>();
var result = string.Empty;

fixed (char* pInput = input)
{
   var plen = pInput + input.Length;
   var toggle = true;

   for (var p = pInput; p < plen; p++)
   {
      if (*p == '/')
      {       
         if (result.Length > 0)
            list.Add(result);
         toggle = !toggle;
         result = string.Empty;
         continue;
      }
      if (toggle)
         result += *p;
   }
}
list.Add(result);
return list.ToArray();

Benchmarks 基准测试

----------------------------------------------------------------------------
Mode             : Release (64Bit)
Test Framework   : .NET Framework 4.7.1 (CLR 4.0.30319.42000)
----------------------------------------------------------------------------
Operating System : Microsoft Windows 10 Pro
Version          : 10.0.17134
----------------------------------------------------------------------------
CPU Name         : Intel(R) Core(TM) i7-3770K CPU @ 3.50GHz
Description      : Intel64 Family 6 Model 58 Stepping 9
Cores (Threads)  : 4 (8)      : Architecture  : x64
Clock Speed      : 3901 MHz   : Bus Speed     : 100 MHz
L2Cache          : 1 MB       : L3Cache       : 8 MB
----------------------------------------------------------------------------

Results 结果

--- Random characters -------------------------------------------------------
| Value         |  Average |  Fastest |   Cycles | Garbage | Test |    Gain |
--- Scale 1 -------------------------------------------------- Time 0.152 ---
| unsafe        | 2.131 µs | 1.461 µs | 10.567 K | 0.000 B | Pass | 78.42 % |
| split         | 3.874 µs | 2.922 µs | 16.804 K | 0.000 B | Pass | 60.76 % |
| regexCompiled | 7.313 µs | 5.845 µs | 29.310 K | 0.000 B | Pass | 25.93 % |
| regex         | 9.873 µs | 7.891 µs | 37.800 K | 0.000 B | Base |  0.00 % |
-----------------------------------------------------------------------------

Summary 摘要

Tested a massive 1,000,000 times each on diffrent string combinations that reflect the original pattern. 对反映原始模式的不同字符串组合分别测试了1,000,000次。

Unsafe is jsut ridiculous and should not be used, Regex is neat and tidy, split is not too unreadable either. 不安全是非常荒谬的,不应该使用,正则表达式整洁,拆分也不是太难以理解。 As epected split is faster. 如预期的那样分裂更快。

However, regex is not as slow as i thought it would be. 但是,正则表达式并没有我想的那么慢。 In the end, its comes down to persoanl perfence and your code reviewer. 最后,它归结为persoanl性能和您的代码审阅者。

Update 更新资料

As sln rightly mentioned in a comment, regex should be compiled for this to be a good benchmark. 正如sln在评论中正确提到的那样,应编译正则表达式以使其成为一个良好的基准。 Note i left out the suggest of leaving out the .Groups[1].Captures.Cast<Capture>().Select(m => m.Value).ToArray(); 注意我遗漏了.Groups[1].Captures.Cast<Capture>().Select(m => m.Value).ToArray(); basically just to leave the result an array of string to keep them all the same. 基本上只是将结果保留为字符串数组以使它们保持相同。

The compilation to IL gives regex a good performance boost. IL的编译使regex表现良好。

Disclaimer , i have nothing against regex and use it all the time 免责声明 ,我对正则表达式一无所知,并一直使用它

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM