简体   繁体   English

C#正则表达式替换多个换行符

[英]C# Regex.Replace Multiple Newlines

I have a text file that contains more or less paragraphs. 我有一个包含更多或更少段落的文本文件。 The text is not actually words, its comma delimited data; 文本实际上不是单词,以逗号分隔。 but that's not really that important. 但这并不是那么重要。 The text file is sort of divided into sections; 文本文件分为几部分; there can be sections, and subsections. 可以有小节和小节。 The division of sections is denoted by more than one newlines and subsections by a newline. 节的划分由多个换行符表示,子节由换行符表示。

So sample data: 因此示例数据:

This is the, start of a, section
908690,246246246,246246
246246,246,246246

This is, the next, section,
sfhklj,sfhjk,4626246
4yw2,fdhds5juj,53ujj

So the above data contains two sections, each with three subsections. 因此,以上数据包含两个部分,每个部分包含三个子部分。 Sometimes however, there is more than one empty line between sections. 但是,有时部分之间有多个空行。 When this occurs, I want to convert the multiple newline characters, say \\n\\n\\n\\n to just \\n\\n ; 发生这种情况时,我想将多个换行符,例如\\n\\n\\n\\n\\n\\n I think regex is probably the way to do this. 我认为正则表达式可能是实现此目的的方法。 I also may need to use different newline standards, unix \\n , and windows \\r\\n . 我可能还需要使用不同的换行符标准,即Unix \\n和Windows \\r\\n I think the files probably contain multiple endline encodings. 我认为文件可能包含多个终端编码。

Here is the regex that I've come up with; 这是我想出的正则表达式; its nothing special: 没什么特别的:

Regex.Replace(input, @"([\r\n|\n]{2,})", Enviroment.NewLine + Enviroment.NewLine}

Firstly, is this a good regex solution? 首先,这是一个好的正则表达式解决方案吗? I'm not that good with regex. 我对正则表达式不太满意。

Secondly, I then want to split each section into an element in a string array: 其次,然后我想将每个部分拆分为字符串数组中的一个元素:

Regex.Split(input, Enviroment.NewLine + Enviroment.NewLine)

Is there a way to combine these steps? 有没有办法组合这些步骤?

[\\r\\n|\\n] is wrong. [\\r\\n|\\n]错误。 That's a character class that matches one of the characters \\r , \\n , or | 这是一个与字符\\r\\n|之一匹配的字符类 | .

Common idioms for matching a generic line separator are (?:\\r\\n|[\\r\\n]) or (?:\\n|\\r\\n?) . 匹配通用行分隔符的常见习惯用法是(?:\\r\\n|[\\r\\n])(?:\\n|\\r\\n?) These will match \\r\\n (DOS/Windows), \\r (older Macintosh), or \\n (Unix/Linux/Mac OS X). 它们将匹配\\r\\n (DOS / Windows), \\r (旧的Macintosh)或\\n (Unix / Linux / Mac OS X)。

I would normalize all line separators to \\n , then split on two or more of those: 我将所有行分隔符标准化为\\n ,然后拆分其中两个或多个:

Regex.Split(Regex.Replace(source, @"(?:\r\n|[\r\n])", "\n"), @"\n{2,}")

I will just use String.Split and first split the text into sections using double newlines as delimiter, then split each of the section into subsection using single newline as delimiter. 我将只使用String.Split并首先使用双换行符作为分隔符将文本拆分为多个部分,然后使用单个换行符作为分隔符将每个部分拆分为多个子部分。 You will then end up with the array you wanted. 然后,您将得到所需的阵列。 You can use List<string> object as the container and add the array returned from the split method using AddRange to the container. 您可以将List<string>对象用作容器,并使用AddRange将split方法返回的数组添加到容器中。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM