Matching and deleting newline character in mutiline perl regex

Question

I know there are tons of questions about matching multiline regexes with perl on this site, however, I'm still having trouble figuring out how to do the below. So any help or links to the relevant questions would be highly appreciated.

I have a text file input.txt that is structured with a field-label (identified by a backslash) and field-contents, like this:

\x text
\y text text
text text
\z text

Field-contents can contain line breaks, but for further processing I need to make sure that all field contents are on one line. The following apparently is able to correctly match across multiple lines, however, it doesn't delete it but instead reinserts it.

#!/usr/bin/perl

$/ =undef; 

{
open(my $in, "<", "input.txt") or die "impossible: $!";
open(my $out, ">", "output.txt") or die "Can't open output.txt: $!"; 

while (<$in>) {
    s/\n([^\\])/ \1/g; # delete all line breaks unless followed by backslash and replace by a single space
    print $out $_ ; 
    }       
}

It adds the space to the front (so I know it correctly finds it) but nonetheless keeps the newline character. Output looks like this:

\x text
\y text text
 text text
\z text

Whereas I was hoping to get this:

\x text
\y text text text text
\z text

Answer 1

I think your input has a carriage return-linefeed pair. You're only replacing the newline but the carriage return is still there.

You can match \\v for vertical whitespace (a bit more than line endings), \\R for a generalized Unicode line ending, [\\r\\n]+ to get either (singly or together), or \\r\\n if you're sure they will both be there. The trick is to choose one that works for you if the line ending changes.

And, the \\1 on the replacement side is better written as a $1 .

Matching and deleting newline character in mutiline perl regex

Question

1 answers

solution1
4 ACCPTED 2018-08-26 20:02:51

Matching and deleting newline character in mutiline perl regex

Question

1 answers

solution1 4 ACCPTED 2018-08-26 20:02:51

solution1
4 ACCPTED 2018-08-26 20:02:51