[英]RegEx in Notepad++ to find lines with less or more than n pipes
I have a large pipe-delimited text file that should have one 3-column record per line.我有一个大的以竖线分隔的文本文件,每行应该有一个 3 列记录。 Many of the records are split up by line breaks within a column.许多记录在列中由换行符拆分。
I need to do a find/replace to get three, and only three, pipes per line/record.我需要做一个查找/替换来获得每行/记录三个,而且只有三个管道。
Here's an example (I added the line breaks ( \\r\\n
) to demonstrate where they are and what needs to be replaced):这是一个示例(我添加了换行符( \\r\\n
)以演示它们的位置以及需要替换的内容):
12-1234|The quick brown fox jumped over the lazy dog.|Every line should look similar to this one|\r\n
56-7890A|This record is split\r\n
\r\n
on to multiple lines|More text|\r\n
09-1234AS|\r\n
||\r\n
\r\n
56-1234|Some text|Some more text\r\n
|\r\n
76-5432ABC|A record will always start with two digits, a dash and four digits|There may or may not be up to three letters after the four digits|\r\n
The caveat is that I need to retain those mid-record line breaks for the target system.需要注意的是,我需要保留目标系统的中间记录换行符。 They need to be replaced with \\.br\\
.它们需要替换为\\.br\\
。 So the final result of the above should look like this:所以上面的最终结果应该是这样的:
12-1234|The quick brown fox jumped over the lazy dog.|Every line should look similar to this one|\r\n
56-7890A|This record is split\.br\\.br\on multiple lines|More text|\r\n
09-1234AS|\.br\||\.br\\r\n
56-1234|Some text|Some more text\.br\|\r\n
76-5432ABC|A record will always start with two digits, a dash and four digits|There may or may not be up to three letters after the four digits|\r\n
As you can see the mid-record line breaks have all been replaced with \\.br\\
and the end-of-line line breaks have been retained to keep each three-column/pipe record on its own line.正如您所看到的,中间记录换行符已全部替换为\\.br\\
并保留行尾换行符以将每个三列/管道记录保留在自己的行上。 Note the last record's text, explaining how each line/record begins.注意最后一条记录的文本,解释每行/记录的开始方式。 I included that in case that would help in building a regex to properly identify the beginning of a record.我将其包括在内,以防有助于构建正则表达式以正确识别记录的开头。
I'm not sure if this can be done in one find/replace step or if it needs to be (or just should be) split up into a couple of steps.我不确定这是否可以在一个查找/替换步骤中完成,或者是否需要(或应该)分成几个步骤。
I had the thought to first search for |\\r\\n
, since all records end with a pipe and a CRLF
, and replace those with dummy text !@#$
.我想先搜索|\\r\\n
,因为所有记录都以管道和CRLF
结尾,并用虚拟文本!@#$
替换它们。 Then search for the remaining line breaks with \\r\\n
, which will be mid-column line breaks and replace those with \\.br\\
, then replace the dummy text with the original line breaks that I want to keep |\\r\\n
.然后用\\r\\n
搜索剩余的换行符,这将是列中间换行符并用\\.br\\
替换它们,然后用我想保留的原始换行符替换虚拟文本|\\r\\n
.
That worked for all but records that looked like the third record in the first example, which has several line breaks after a pipe within the record.这适用于除第一个示例中看起来像第三个记录之外的所有记录,它在记录中的管道后有几个换行符。 In such a large file as I am working with it wasn't until much later that I found that the above process I was using didn't properly catch those instances.在我正在处理的如此大的文件中,直到很久以后我才发现我使用的上述过程没有正确捕获这些实例。
You can use您可以使用
(?:\G(?!^(?<!.))|^\d{2}-\d+[A-Z]*\|[^|]*?(?:\|[^|]*?)?)\K\R+
Replace with \\\\.br\\\\
.替换为\\\\.br\\\\
。 See the regex demo .请参阅正则表达式演示。 Details :详情:
(?:\\G(?!^(?<!.))|^\\d{2}-\\d+[AZ]*\\|[^|]*?(?:\\|[^|]*?)?)
- either the end of the previous match ( \\G(?!^(?<!.))
) or ( |
) start of a line, two digits, 0
, one or more digits, zero or more letters, a |
(?:\\G(?!^(?<!.))|^\\d{2}-\\d+[AZ]*\\|[^|]*?(?:\\|[^|]*?)?)
- 前一个匹配的结尾( \\G(?!^(?<!.))
)或( |
)一行的开始,两位数字, 0
,一位或多位数字,零个或多个字母,一个|
, then any zero or more chars other than |
,然后是除|
之外的任何零个或多个字符, as few as possible, and then an optional sequence of |
, 尽可能少,然后是一个可选的|
序列and any zero or more chars other than |
以及除|
之外的任何零个或多个字符, as few as possible (see ^\\d{2}-\\d+[AZ]*\\|[^|]*?(?:\\|[^|]*?)?
) ,尽可能少(见^\\d{2}-\\d+[AZ]*\\|[^|]*?(?:\\|[^|]*?)?
)\\K
- omit the text matched \\K
- 省略匹配的文本\\R+
- one or more line breaks. \\R+
- 一个或多个换行符。See the Notepad++ demo:请参阅 Notepad++ 演示:
If you need to remove empty lines after this, use Edit
> Line Operations
> Remove Empty Lines
.如果您需要在此之后删除空行,请使用Edit
> Line Operations
> Remove Empty Lines
。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.