[英]Parse string in C# extension method with regex
I need to create extension method which pars(split) my string. 我需要创建解析(分割)我的字符串的扩展方法。
For example: If I have string 例如:如果我有字符串
COMMAND 1 PROCESSED "JOB command" 20160801 09:05:24
命令1已处理“作业命令” 20160801 09:05:24
It should be split like this 应该这样分割
COMMAND
命令
1
1个
PROCESSED
处理
"JOB command"
“ JOB命令”
20160801
20160801
09:05:24
09:05:24
Other example. 其他例子。 If I have string:
如果我有字符串:
COMMAND 2 ERROR 06 00000032 "Message window is still active."
命令2错误06 00000032“消息窗口仍处于活动状态。” 20160801 09:05:24
20160801 09:05:24
It should be split like this: 应该这样分割:
COMMAND
命令
2
2
ERROR
错误
06
06
00000032
00000032
"Message window is still active."
“消息窗口仍处于活动状态。”
20160801 09:05:24
20160801 09:05:24
I have solution for this. 我对此有解决方案。 But I am sure that there is much cleaner solution.
但是,我确信有很多更清洁的解决方案。
My solution: 我的解决方案:
public static List<string> GetTokens(this string line)
{
// TODO: Code refactoring:
var res = new List<string>();
var parts = Regex.Split(line, "/[^\\s\"']+|\"([^\"]*)\"|'([^']*)'/g");
var subParts = parts[0].Split(' ');
foreach (var val in subParts)
{
res.Add(val);
}
res.Add(parts[1]);
subParts = parts[2].Split(' ');
foreach (var val in subParts)
{
res.Add(val);
}
res.RemoveAll(f => f.Trim() == "");
return res;
}
I would like to implement cleaner solution. 我想实施更清洁的解决方案。 Any ideas?
有任何想法吗?
I suggest implementing a simple loop instead of complex regular expression : 我建议实现一个简单的循环而不是复杂的正则表达式 :
public static IEnumerable<String> GetTokens(this string value) {
if (string.IsNullOrEmpty(value))
yield break; // or throw exception in case of value == null
bool inQuotation = false;
int index = 0;
for (int i = 0; i < value.Length; ++i) {
char ch = value[i];
if (ch == '"')
inQuotation = !inQuotation;
else if ((ch == ' ') && (!inQuotation)) {
yield return value.Substring(index, i - index);
index = i + 1;
}
}
if (index < value.Length)
yield return value.Substring(index, value.Length - index);
}
Test 测试
var source =
"COMMAND 2 ERROR 06 00000032 \"Message window is still active.\" 20160801 09:05:24";
Console.Write(string.Join(Environment.NewLine, GetTokens(source)));
Output 输出量
COMMAND
2
ERROR
06
00000032
"Message window is still active."
20160801
09:05:24
Edit : in case you want two quotation types with "
(double) as well as '
(single): 编辑 :如果您想要两个带
"
(双)和'
(单)的报价类型:
public static IEnumerable<String> GetTokens(string value) {
if (string.IsNullOrEmpty(value))
yield break;
bool inQuotation = false;
bool inApostroph = false;
int index = 0;
for (int i = 0; i < value.Length; ++i) {
char ch = value[i];
if (inQuotation)
inQuotation = ch != '"';
else if (inApostroph)
inApostroph = ch != '\'';
else if (ch == '"')
inQuotation = true;
else if (ch == '\'')
inApostroph = true;
else if ((ch == ' ') && (!inQuotation)) {
yield return value.Substring(index, i - index);
index = i + 1;
}
}
if (index < value.Length)
yield return value.Substring(index, value.Length - index);
}
After a while a figured out some simple code: 过了一会儿,想出了一些简单的代码:
public static List<string> GetTokens(this string line)
{
return Regex.Matches(line, @"([^\s""]+|""([^""]*)"")").OfType<Match>().Select(l => l.Groups[1].Value).ToList();
}
I tested the code with a MessageBox
which showed the List
with |
我用一个
MessageBox
测试了代码,该MessageBox
显示了带有|
的List
|
in-between each item: 在每个项目之间:
You can use regex like : ([^\\s"]+|"[^"]*")
with globlal identifier 您可以使用带有(globlal)标识符的正则表达式,如:
([^\\s"]+|"[^"]*")
A pure regex solution: 纯正则表达式解决方案:
public static List<string> GetTokens(this string line)
{
return Regex.Matches(line,
@""".*?""|\S+").Cast<Match>().Select(m => m.Value).ToList();
}
The ".*?"|\\S+
regex matches either a quoted string or a non-space char sequence. ".*?"|\\S+
正则表达式匹配带引号的字符串或非空格字符序列。 These matches then can be returned as collection in one go. 然后可以一次性将这些匹配项作为集合返回。
Here is a demo: https://ideone.com/hmLQIt . 这是一个演示: https : //ideone.com/hmLQIt 。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.