简体   繁体   English

使用正则表达式删除文本文件中的多余空白行

[英]Remove extra blank lines in text file using regular expression

Hi I want to remove extra blank lines in my source text file (which means if there are 2 or more blank lines only keep 1 blank line). 嗨,我想在源文本文件中删除多余的空行(这意味着如果有2个或更多空行,则仅保留1个空行)。 I used this pattern: 我使用了这种模式:

^(\s*(\n|\r|\r\n)){2,}

It cannot handle empty line at end of file, like this: 它无法处理文件末尾的空行,如下所示:

1. BlablablaCRLF
2. CRLF
3. 

above (line 3) is the end of file, VS StyleCop complains that there are multiple blank lines here. 上面(第3行)是文件的末尾,VS StyleCop抱怨这里有多个空行。 it looks like a newline at end of file but actually nothing there, I turned on "Show all character" in notepad++, I was expecting to see a CRLF at end of file however it didn't. 它看起来像文件末尾的换行符,但实际上什么都没有,我在notepad ++中打开了“显示所有字符”,我期望在文件末尾看到CRLF,但是没有。 My pattern cannot identify this, how to handle this case? 我的模式无法识别此情况,该如何处理? Thanks! 谢谢!

Basic Answer 基本答案

If this is what you want to match: 如果这是您要匹配的:

  1. Multiple continuous empty lines where multiple means > 1. 多个连续的空行,其中多个均值> 1。
  2. All empty lines at the end of a file except the one implicitely generated by \\n -terminating the file (which can be considered as a good practice, see here ). 文件末尾的所有空行,由\\n终止文件隐式生成的行除外(可以认为是一种好习惯,请参见此处 )。
  3. All redundant whitespaces after the terminating \\n . 终止\\n之后的所有冗余空白。

Then this pattern might help you: 然后,该模式可以帮助您:

(^\s*(\r|\n)){2,}|^\s+(\r|\n)?\Z

Further Explanation 进一步说明

The first part (^\\s*(\\r|\\n)){2,} takes care of 1., the second part ^\\s+(\\r|\\n)?\\Z matches redundant empty lines at the end of a file or redundant whitespaces following the terminating \\n . 第一部分(^\\s*(\\r|\\n)){2,}为1,第二部分^\\s+(\\r|\\n)?\\Z匹配末尾多余的空行终止\\n之后的文件或多余的空格。

If your file looks like this (with Unix file endings) ... 如果您的文件看起来像这样(带有Unix文件结尾)...

1. FirstLine\n
2. 
3. ThirdLine\n
4. FourthLine\n
5.
6.
7. SeventhLine\n

... then it only matches lines 5 and 6, but nothing at the end. ...那么它只匹配第5行和第6行,但最后不匹配。 Notepad++ though will show an 8th line at the end due to the terminating \\n . 由于\\n终止,因此Notepad ++会在最后显示第8行。 However, if there would be multiple \\n s at the end of the file or if there would be additional \\t or spaces after the terminating \\n in the 7th line, theese would match. 但是,如果在文件末尾有多个\\n ,或者在第7行中的\\n结束后还有其他\\t或空格,则将匹配。

If you also want to match the line generated by the \\n termination (and as a result remove the \\n termination when replacing), you could as well use ^\\s*\\Z instead for the second part of the regular expression. 如果您还想匹配\\n终止符生成的行(并因此在替换时删除\\n终止符),则可以使用^\\s*\\Z代替正则表达式的第二部分。

Additional explanation of \\s*(\\r\\n) : This matches every allowed combination like abc\\n , abc\\r\\n or abc\\r because \\s also includes \\n and \\r . \\s*(\\r\\n)附加说明:这匹配每个允许的组合,例如abc\\nabc\\r\\nabc\\r因为\\s还包括\\n\\r

\\Z matches the end of the whole file/input (whereas $ would only match a line's end). \\Z匹配整个文件/输入的结尾(而$只匹配一行的结尾)。

I'm sure there might be a shorter version of the regular expression but my first intention was to make it work and understandable. 我敢肯定正则表达式的版本可能会短一些,但我的初衷是使其正常工作并易于理解。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM