简体   繁体   中英

Matching and deleting newline character in mutiline perl regex

I know there are tons of questions about matching multiline regexes with perl on this site, however, I'm still having trouble figuring out how to do the below. So any help or links to the relevant questions would be highly appreciated.

I have a text file input.txt that is structured with a field-label (identified by a backslash) and field-contents, like this:

\x text
\y text text
text text
\z text

Field-contents can contain line breaks, but for further processing I need to make sure that all field contents are on one line. The following apparently is able to correctly match across multiple lines, however, it doesn't delete it but instead reinserts it.

#!/usr/bin/perl

$/ =undef; 

{
open(my $in, "<", "input.txt") or die "impossible: $!";
open(my $out, ">", "output.txt") or die "Can't open output.txt: $!"; 

while (<$in>) {
    s/\n([^\\])/ \1/g; # delete all line breaks unless followed by backslash and replace by a single space
    print $out $_ ; 
    }       
}

It adds the space to the front (so I know it correctly finds it) but nonetheless keeps the newline character. Output looks like this:

\x text
\y text text
 text text
\z text

Whereas I was hoping to get this:

\x text
\y text text text text
\z text

I think your input has a carriage return-linefeed pair. You're only replacing the newline but the carriage return is still there.

You can match \\v for vertical whitespace (a bit more than line endings), \\R for a generalized Unicode line ending, [\\r\\n]+ to get either (singly or together), or \\r\\n if you're sure they will both be there. The trick is to choose one that works for you if the line ending changes.

And, the \\1 on the replacement side is better written as a $1 .

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM