I know there are tons of questions about matching multiline regexes with perl on this site, however, I'm still having trouble figuring out how to do the below. So any help or links to the relevant questions would be highly appreciated.
I have a text file input.txt
that is structured with a field-label (identified by a backslash) and field-contents, like this:
\x text
\y text text
text text
\z text
Field-contents can contain line breaks, but for further processing I need to make sure that all field contents are on one line. The following apparently is able to correctly match across multiple lines, however, it doesn't delete it but instead reinserts it.
#!/usr/bin/perl
$/ =undef;
{
open(my $in, "<", "input.txt") or die "impossible: $!";
open(my $out, ">", "output.txt") or die "Can't open output.txt: $!";
while (<$in>) {
s/\n([^\\])/ \1/g; # delete all line breaks unless followed by backslash and replace by a single space
print $out $_ ;
}
}
It adds the space to the front (so I know it correctly finds it) but nonetheless keeps the newline character. Output looks like this:
\x text
\y text text
text text
\z text
Whereas I was hoping to get this:
\x text
\y text text text text
\z text
I think your input has a carriage return-linefeed pair. You're only replacing the newline but the carriage return is still there.
You can match \\v
for vertical whitespace (a bit more than line endings), \\R
for a generalized Unicode line ending, [\\r\\n]+
to get either (singly or together), or \\r\\n
if you're sure they will both be there. The trick is to choose one that works for you if the line ending changes.
And, the \\1
on the replacement side is better written as a $1
.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.