简体   繁体   English

Notepad ++中的RegEx查找少于或多于n个管道的行

[英]RegEx in Notepad++ to find lines with less or more than n pipes

I have a large pipe-delimited text file that should have one 3-column record per line.我有一个大的以竖线分隔的文本文件,每行应该有一个 3 列记录。 Many of the records are split up by line breaks within a column.许多记录在列中由换行符拆分。

I need to do a find/replace to get three, and only three, pipes per line/record.我需要做一个查找/替换来获得每行/记录三个,而且只有三个管道。

Here's an example (I added the line breaks ( \\r\\n ) to demonstrate where they are and what needs to be replaced):这是一个示例(我添加了换行符( \\r\\n )以演示它们的位置以及需要替换的内容):

12-1234|The quick brown fox jumped over the lazy dog.|Every line should look similar to this one|\r\n
56-7890A|This record is split\r\n
\r\n
on to multiple lines|More text|\r\n
09-1234AS|\r\n
||\r\n
\r\n
56-1234|Some text|Some more text\r\n
|\r\n
76-5432ABC|A record will always start with two digits, a dash and four digits|There may or may not be up to three letters after the four digits|\r\n

The caveat is that I need to retain those mid-record line breaks for the target system.需要注意的是,我需要保留目标系统的中间记录换行符。 They need to be replaced with \\.br\\ .它们需要替换为\\.br\\ So the final result of the above should look like this:所以上面的最终结果应该是这样的:

12-1234|The quick brown fox jumped over the lazy dog.|Every line should look similar to this one|\r\n
56-7890A|This record is split\.br\\.br\on multiple lines|More text|\r\n
09-1234AS|\.br\||\.br\\r\n
56-1234|Some text|Some more text\.br\|\r\n
76-5432ABC|A record will always start with two digits, a dash and four digits|There may or may not be up to three letters after the four digits|\r\n

As you can see the mid-record line breaks have all been replaced with \\.br\\ and the end-of-line line breaks have been retained to keep each three-column/pipe record on its own line.正如您所看到的,中间记录换行符已全部替换为\\.br\\并保留行尾换行符以将每个三列/管道记录保留在自己的行上。 Note the last record's text, explaining how each line/record begins.注意最后一条记录的文本,解释每行/记录的开始方式。 I included that in case that would help in building a regex to properly identify the beginning of a record.我将其包括在内,以防有助于构建正则表达式以正确识别记录的开头。

I'm not sure if this can be done in one find/replace step or if it needs to be (or just should be) split up into a couple of steps.我不确定这是否可以在一个查找/替换步骤中完成,或者是否需要(或应该)分成几个步骤。

I had the thought to first search for |\\r\\n , since all records end with a pipe and a CRLF , and replace those with dummy text !@#$ .我想先搜索|\\r\\n ,因为所有记录都以管道和CRLF结尾,并用虚拟文本!@#$替换它们。 Then search for the remaining line breaks with \\r\\n , which will be mid-column line breaks and replace those with \\.br\\ , then replace the dummy text with the original line breaks that I want to keep |\\r\\n .然后用\\r\\n搜索剩余的换行符,这将是列中间换行符并用\\.br\\替换它们,然后用我想保留的原始换行符替换虚拟文本|\\r\\n .

That worked for all but records that looked like the third record in the first example, which has several line breaks after a pipe within the record.这适用于除第一个示例中看起来像第三个记录之外的所有记录,它在记录中的管道后有几个换行符。 In such a large file as I am working with it wasn't until much later that I found that the above process I was using didn't properly catch those instances.在我正在处理的如此大的文件中,直到很久以后我才发现我使用的上述过程没有正确捕获这些实例。

You can use您可以使用

(?:\G(?!^(?<!.))|^\d{2}-\d+[A-Z]*\|[^|]*?(?:\|[^|]*?)?)\K\R+

Replace with \\\\.br\\\\ .替换为\\\\.br\\\\ See the regex demo .请参阅正则表达式演示 Details :详情

  • (?:\\G(?!^(?<!.))|^\\d{2}-\\d+[AZ]*\\|[^|]*?(?:\\|[^|]*?)?) - either the end of the previous match ( \\G(?!^(?<!.)) ) or ( | ) start of a line, two digits, 0 , one or more digits, zero or more letters, a | (?:\\G(?!^(?<!.))|^\\d{2}-\\d+[AZ]*\\|[^|]*?(?:\\|[^|]*?)?) - 前一个匹配的结尾( \\G(?!^(?<!.)) )或( | )一行的开始,两位数字, 0 ,一位或多位数字,零个或多个字母,一个| , then any zero or more chars other than | ,然后是除|之外的任何零个或多个字符, as few as possible, and then an optional sequence of | , 尽可能少,然后是一个可选的|序列and any zero or more chars other than |以及除|之外的任何零个或多个字符, as few as possible (see ^\\d{2}-\\d+[AZ]*\\|[^|]*?(?:\\|[^|]*?)? ) ,尽可能少(见^\\d{2}-\\d+[AZ]*\\|[^|]*?(?:\\|[^|]*?)?
  • \\K - omit the text matched \\K - 省略匹配的文本
  • \\R+ - one or more line breaks. \\R+ - 一个或多个换行符。

See the Notepad++ demo:请参阅 Notepad++ 演示:

在此处输入图片说明

If you need to remove empty lines after this, use Edit > Line Operations > Remove Empty Lines .如果您需要在此之后删除空行,请使用Edit > Line Operations > Remove Empty Lines

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM