简体   繁体   English

在 .NET 中使用正则表达式拆分字符串

[英]Splitting a string with regular expressions in .NET

I am in need of a regular expression that I can use to examine a string and return specific items when I do a RegEx.Split() in .NET.当我在 .NET 中执行 RegEx.Split() 时,我需要一个正则表达式来检查字符串并返回特定项目。 I've been trying to do this on my own, but I can never seem to get what I need, and the results never make any sense.我一直在尝试自己做这件事,但我似乎永远无法得到我需要的东西,结果也没有任何意义。 Obviously I do not have a good handle on writing regular expressions.显然我没有很好地处理编写正则表达式。

So here is the string...所以这里是字符串...

"%date - %-5level - [%thread] - %logger - %message - %exception%newline"

I essentially want to be returned an array that looks like the following:我本质上想要返回一个如下所示的数组:

"date"
"-5level"
"thread"
"logger"
"message"
"exception"
"newline"

The following code is close, but not quite.以下代码很接近,但并不完全。

Regex exp = new Regex(@"\W+");
string[] s = exp.Split(@"%date - %-5level - [%thread] - %logger - %message - %exception%newline");

I get the following:我得到以下信息:

""
"date"
"5level"
"thread"
"logger"
"message"
"exception"
"newline"

For some reason, I have an empty string as the first index, and the 3rd index is missing the "-".出于某种原因,我有一个空字符串作为第一个索引,而第三个索引缺少“-”。 I assume because it is not a part of a "word".我假设是因为它不是“单词”的一部分。

The "-" aside for the moment, I then want to split "5level" into an array:暂时将“-”放在一边,然后我想将“5level”拆分为一个数组:

"5"
"level"

I experimented with this:我对此进行了实验:

Regex exp2 = new Regex(@"(\d+)([a-zA-Z]+)");
string[] s2 = exp2.Split("5level");

But, it returns 2 indexes with empty strings in addition to the split items I want like so:但是,除了我想要的拆分项之外,它还返回 2 个带有空字符串的索引:

""
"5"
"level"
""

I'm stumped on how to format the expression to give me what I need.我对如何格式化表达式以提供我需要的东西感到困惑。 Any help would be appreciated.任何帮助,将不胜感激。

Instead of using Regex.Split , it might be easier to match the tokens you need:而不是使用Regex.Split ,匹配您需要的标记可能更容易:

MatchCollection matches = Regex.Matches(s, @"%([\w\-]+)");
string[] words = matches.Cast<Match>().Select(m => m.Groups[1].Value).ToArray();

Split may add empty matches, as you've witnessed, that will have to be filtered out.正如您所见,Split 可能会添加空匹配项,这些匹配项必须被过滤掉。

A better way of doing this is to use Named Capturing Groups from RegEx engine and to filter out any empty matches in the Linq query.更好的方法是使用RegEx引擎中的Named Capturing Groups并过滤掉Linq查询中的任何空匹配项。

MatchCollection matches = Regex.Matches(s, @"%(?<SomeName>[\w\-]+)");
string[] words = matches.Cast<Match>().Where(m => m.Length > 0 ).Select(m => m.Groups["SomeName"].Value).ToArray();

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM