[英]Regex to split and ignore brackets
I need to split by comma in the text but the text also has a comma inside brackets which need to be ignored我需要在文本中用逗号分隔,但文本在括号内还有一个逗号需要忽略
Input text : Selectroasted peanuts, Sugars (sugar, fancymolasses) ,Hydrogenatedvegetable oil (cottonseed and rapeseed oil),Salt.输入文本:精选花生、糖(糖、花式糖蜜) 、氢化植物油(棉籽油和菜籽油)、盐。
Expected output:预期输出:
MyCode我的代码
string pattern = @"\s*(?:""[^""]*""|\([^)]*\)|[^, ]+)";
string input = "Selectroasted peanuts,Sugars (sugar, fancymolasses),Hydrogenatedvegetable oil (cottonseed and rapeseed oil),Salt.";
foreach (Match m in Regex.Matches(input, pattern))
{
Console.WriteLine("{0}", m.Value);
}
The output I am getting:我得到的输出:
Please help.请帮忙。
You can use您可以使用
string pattern = @"(?:""[^""]*""|\([^()]*\)|[^,])+";
string input = "Selectroasted peanuts,Sugars (sugar, fancymolasses),Hydrogenatedvegetable oil (cottonseed and rapeseed oil),Salt.";
foreach (Match m in Regex.Matches(input.TrimEnd(new[] {'!', '?', '.', '…'}), pattern))
{
Console.WriteLine("{0}", m.Value);
}
// => Selectroasted peanuts
// Sugars (sugar, fancymolasses)
// Hydrogenatedvegetable oil (cottonseed and rapeseed oil)
// Salt
See the C# demo .请参阅C# 演示。 See the regex demo , too.也请参阅正则表达式演示。 It matches one or more occurrences of它匹配一次或多次出现
"[^"]*"
- "
, zero or more chars other than "
and then a "
"[^"]*"
- "
比其他零个或多个字符"
,然后"
|
- or - 或者\\([^()]*\\)
- a (
, then any zero or more chars other than (
and )
and then a )
char \\([^()]*\\)
- a (
,然后是除(
和)
之外的任何零个或多个字符,然后是 a )
字符|
- or - 或者[^,]
- a char other than a ,
. [^,]
- 除 a ,
之外的字符。 Note the .TrimEnd(new[] {'!', '?', '.', '…'})
part in the code snippet is meant to remove the trailing sentence punctuation, but if you can affort Salt.
请注意代码片段中的.TrimEnd(new[] {'!', '?', '.', '…'})
部分旨在删除结尾的句子标点符号,但如果您可以使用Salt.
in the output, you can remove that part.在输出中,您可以删除该部分。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.