In Perl, how to match two consecutive Carriage Returns?

Question

Hi StackOverflow buddies,

I'm on Windows platform ; I have a data file but something wrong happened and (I don't know why) all combinations of "Carriage Return + New Line" became "Carriage Return + Carriage Return + New Line", (190128 edit:) for example:

When viewing the file as plain text, it is:

When viewing the same file in hex mode, it is:

Out of practical purposes I need to remove the extra "0D" in double "0D"s like " .... 30 30 0D 0D 0A 30 30 .... ", and change it to " .... 30 30 0D 0A 30 30 .... ".

190129 edit: Besides, to ensure that my problem can be reproduced, I uploaded my data file to GitHub at URL (should download & unzip it before using; in a binary \\ hex editor you can 0D 0D 0A in the first line): https://github.com/katyusza/hello_world/blob/master/ram_init.zip

I used the following Perl script to remove the extra Carriage Return, but to my astonishment my regex just do NOT work!! My entire code is ( 190129 edit : past entire Perl script here):

use warnings            ;
use strict              ;
use File::Basename      ;

#-----------------------------------------------------------
# command line handling, file open \ create
#-----------------------------------------------------------

# Capture input input filename from command line:
my $input_fn = $ARGV[0] or
die "Should provide input file name at command line!\n";

# Parse input file name, and generate output file name:
my ($iname, $ipath, $isuffix) = fileparse($input_fn, qr/\.[^.]*/);
my $output_fn = $iname."_pruneNonPrintable".$isuffix;

# Open input file:
open (my $FIN, "<", $input_fn) or die "Open file error $!\n";

# Create output file:
open (my $FO, ">", $output_fn) or die "Create file error $!\n";


#-----------------------------------------------------------
# Read input file, search & replace, write to output
#-----------------------------------------------------------

# Read all lines in one go:
$/ = undef;

# Read entire file into variable:
my $prune_txt = <$FIN> ;

# Do match & replace:
 $prune_txt =~ s/\x0D\x0D/\x0D/g;          # do NOT work.
# $prune_txt =~ s/\x0d\x0d/\x30/g;          # do NOT work.
# $prune_txt =~ s/\x30\x0d/\x0d/g;          # can work.
# $prune_txt =~ s/\x0d\x0d\x0a/\x0d\x0a/gs; # do NOT work.

# Print end time of processing:
print $FO $prune_txt  ;

# Close files:
close($FIN)     ;
close($FO)      ;

I did everything I could to match two consecutive Carriage Returns, but failed. Can anyone please point out my mistake, or tell me the right way to go? Thanks in advance!

Answer 1

On Windows, file handles have a :crlf layer given to them by default.

This layer converts CR LF to LF on read.
This layer converts LF to CR LF on write.

Solution 1: Compensate for the :crlf layer.

You'd use this solution if you want to end up with system-appropriate line endings.

# ... read ...      # CR CR LF ⇒ CR LF
s/\r+\n/\n/g;       # CR LF    ⇒ LF
# ... write ...     # LF       ⇒ CR LF

Solution 2: Remove the :crlf layer.

You'd use this solution if you want to end up with CR LF unconditionally.

Use <:raw and >:raw instead of < and > as the mode.

# ... read ...      # CR CR LF ⇒ CR CR LF
s/\r*\n/\r\n/g;     # CR CR LF ⇒ CR LF
# ... write ...     # CR LF    ⇒ CR LF

Answer 2

The first of your regexes appears to work fine for me, which means that there may be an issue in some other piece of code. Please provide a Minimal, Complete, and Verifiable Example , which means including sample input data and so on.

$ perl -wMstrict -e 'print "Foo\r\r\nBar\r\r\n"' >test.txt
$ hexdump -C test.txt 
00000000  46 6f 6f 0d 0d 0a 42 61  72 0d 0d 0a              |Foo...Bar...|
0000000c
$ cat test.pl 
#!/usr/bin/env perl
use warnings;
use strict;
use Data::Dump;

my $filename = 'test.txt';
open my $fh, '<:raw:encoding(ASCII)', $filename or die "$filename: $!";
my $prune_txt = do { local $/; <$fh> }; # slurp file
close $fh;

dd $prune_txt;
$prune_txt =~ s/\x0D\x0D/\x0D/g;
dd $prune_txt;

$ perl test.pl
"Foo\r\r\nBar\r\r\n"
"Foo\r\nBar\r\n"

By the way, it's not immediately obvious to me which encoding your file is using? In the above example, you may need to adjust the :encoding(...) layer appropriately.

In Perl, how to match two consecutive Carriage Returns?

Question

2 answers

solution1
2 ACCPTED 2019-01-29 06:36:40

solution2
1 2019-01-28 19:31:41

In Perl, how to match two consecutive Carriage Returns?

Question

2 answers

solution1 2 ACCPTED 2019-01-29 06:36:40

solution2 1 2019-01-28 19:31:41

solution1
2 ACCPTED 2019-01-29 06:36:40

solution2
1 2019-01-28 19:31:41