简体   繁体   English

正则表达式在C#中匹配和拆分字符串

[英]Regex match and split string in C#

I have some ASCII documents in the following format: 我有以下格式的ASCII文件:

[section heading]
paragraphs......

[section heading]
paragraphs......
...

Note: heading text are always enclosed in some specific pattern (eg [ ] in the above example) 注意:标题文本始终以某些特定模式括起来(例如,上例中的[ ]

I want to split the file into separate sections (each with a heading and the content ). 我想将文件分成单独的部分(每个部分都有标题内容 )。

What would be the most efficient way to parse the above document? 解析以上文档的最有效方法是什么?

Using Regex.Match() I can extract the headings, but not the subsequent text content. 使用Regex.Match()我可以提取标题,但不能提取后续文本内容。

Using Regex.Split() I can grab the content, but not the related headings. 使用Regex.Split()可以获取内容,但不能获取相关的标题。

Is it possible combine these two Regex methods to parse the document? 是否可以将这两个Regex方法结合使用来解析文档? Are there better ways to achieve the same? 是否有更好的方法可以达到相同目的?

Try this: 尝试这个:

string search = "\[([\w ]+)\]([^\[]*)";
foreach (Match match in Regex.Matches(yourtext, search))
    {
        string heading = match.Groups[1];
        string text = match.Groups[2];
    }

The regular expression capture both the heading and the paragraph. 正则表达式同时捕获标题段落。 Thanks to capturing groups (between parentheses), you can extract both of them by iterating over the matches. 多亏了捕获组(在括号之间),您可以通过遍历匹配项来提取它们。

(\[[^\]]*\])\n([\s\S]*?)(?=\n\[|$)

You can try this.Grab the group 1 and group 2.See demo. 您可以试试看。获取第1组和第2组的信息。

https://regex101.com/r/gU4aG0/1 https://regex101.com/r/gU4aG0/1

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM