简体   繁体   English

正则表达式用多行代替一行

[英]Regex substitute multiple lines for a single line

I have a plain text file in which I need to substitute multiple consecutive lines of text with a single replacement line. 我有一个纯文本文件,其中需要用单个替换行替换多个连续的文本行。 For example, when I have a date and time, followed by a blank line, followed by a page number, 例如,当我有一个日期和时间,然后是空白行,然后是页码时,

11/13/2018 08:33:00

Page 1 of 1

I'd like to replace it with a single line (eg, PAGE BREAK ). 我想用一行代替它(例如PAGE BREAK )。

I've tried 我试过了

sed 's/\d{2}\/\d{2}\/\d{4} \d{2}:\d{2}:\d{2}\n\nPage \d of \d/PAGE BREAK/g' file1.txt > file2.txt

and

perl -pe 's/\d{2}\/\d{2}\/\d{4} \d{2}:\d{2}:\d{2}\n\nPage \d of \d/PAGE BREAK/g' file1.txt > file2.txt

but it leaves the text unchanged. 但文字保持不变。

Both sed and Perl process the input line by line. sed和Perl都逐行处理输入。 You can tell Perl to load the whole file into memory by using -0777 (if it's not too large): 您可以使用-0777告诉Perl将整个文件加载到内存中(如果不是太大):

perl -0777 -pe 's=[0-9]{2}/[0-9]{2}/[0-9]{4} [0-9]{2}:[0-9]{2}:[0-9]{2}\n\nPage [0-9]+ of [0-9]+=PAGE BREAK=g'

Note that I used [0-9] , because \\d can match ٤, ໖, ६, or 𝟡. 请注意,我使用[0-9] ,因为\\d可以匹配٤、໖、६或𝟡。

I also used s=== instead of s/// so I don't have to backslash the slashes in the date part. 我还使用了s===而不是s///所以我不必在日期部分反斜杠。

Another Perl variant 另一个Perl变体

$ cat page_break.txt
123 45 jh kljl
11/13/2018 08:33:00

Page 1 of 1
ghjgjh hkjhj
fhfghfghfh
11/13/2018 08:33:00

Page 1 of 2
ghgigkjkj

$ perl -ne '{ if ( (/\d{2}\/\d{2}\/\d{4} \d{2}:\d{2}:\d{2}/ and $x++)or ( /^\s*$/ and $x++) or (/Page \d of \d/ and $x++) ){} if($x==0) { print "$_" } if($x==3) { print "PAGE BREAK\n"; $x=0} }' page_break.txt
123 45 jh kljl
PAGE BREAK
ghjgjh hkjhj
fhfghfghfh
PAGE BREAK
ghgigkjkj

$

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM