简体   繁体   English

正则表达式拆分和忽略括号

[英]Regex to split and ignore brackets

I need to split by comma in the text but the text also has a comma inside brackets which need to be ignored我需要在文本中用逗号分隔,但文本在括号内还有一个逗号需要忽略

Input text : Selectroasted peanuts, Sugars (sugar, fancymolasses) ,Hydrogenatedvegetable oil (cottonseed and rapeseed oil),Salt.输入文本:精选花生、糖(糖、花式糖蜜) 、氢化植物油(棉籽油和菜籽油)、盐。

Expected output:预期输出:

  • Selectroasted peanuts精选烤花生
  • Sugars (sugar, fancymolasses)糖(糖,花式糖蜜)
  • Hydrogenatedvegetable oil (cottonseed and rapeseed oil)氢化植物油(棉籽油和菜籽油)
  • Salt

MyCode我的代码

string pattern = @"\s*(?:""[^""]*""|\([^)]*\)|[^, ]+)";
string input = "Selectroasted peanuts,Sugars (sugar, fancymolasses),Hydrogenatedvegetable oil (cottonseed and rapeseed oil),Salt."; 
foreach (Match m in Regex.Matches(input, pattern)) 
{ 
Console.WriteLine("{0}", m.Value); 
}

The output I am getting:我得到的输出:

  • Selectroasted精选烤
  • peanuts花生
  • Sugars糖类
  • (sugar, fancymolasses) (糖,花式糖蜜)
  • Hydrogenatedvegetable氢化蔬菜
  • oil
  • (cottonseed and rapeseed oil) (棉籽油和菜籽油)
  • Salt

Please help.请帮忙。

You can use您可以使用

string pattern = @"(?:""[^""]*""|\([^()]*\)|[^,])+";
string input = "Selectroasted peanuts,Sugars (sugar, fancymolasses),Hydrogenatedvegetable oil (cottonseed and rapeseed oil),Salt."; 
foreach (Match m in Regex.Matches(input.TrimEnd(new[] {'!', '?', '.', '…'}), pattern)) 
{ 
    Console.WriteLine("{0}", m.Value); 
}
// => Selectroasted peanuts
//    Sugars (sugar, fancymolasses)
//    Hydrogenatedvegetable oil (cottonseed and rapeseed oil)
//    Salt

See the C# demo .请参阅C# 演示 See the regex demo , too.也请参阅正则表达式演示 It matches one or more occurrences of它匹配一次或多次出现

  • "[^"]*" - " , zero or more chars other than " and then a " "[^"]*" - "比其他零个或多个字符" ,然后"
  • | - or - 或者
  • \\([^()]*\\) - a ( , then any zero or more chars other than ( and ) and then a ) char \\([^()]*\\) - a ( ,然后是除()之外的任何零个或多个字符,然后是 a )字符
  • | - or - 或者
  • [^,] - a char other than a , . [^,] - 除 a ,之外的字符。

Note the .TrimEnd(new[] {'!', '?', '.', '…'}) part in the code snippet is meant to remove the trailing sentence punctuation, but if you can affort Salt.请注意代码片段中的.TrimEnd(new[] {'!', '?', '.', '…'})部分旨在删除结尾的句子标点符号,但如果您可以使用Salt. in the output, you can remove that part.在输出中,您可以删除该部分。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM