简体   繁体   中英

In Perl, how to match two consecutive Carriage Returns?

Hi StackOverflow buddies,

I'm on Windows platform ; I have a data file but something wrong happened and (I don't know why) all combinations of "Carriage Return + New Line" became "Carriage Return + Carriage Return + New Line", (190128 edit:) for example:

When viewing the file as plain text, it is:

纯文本文本文件(带有不可打印的东西)

When viewing the same file in hex mode, it is:

十六进制模式下的文本文件,可以看到双“0D”

Out of practical purposes I need to remove the extra "0D" in double "0D"s like " .... 30 30 0D 0D 0A 30 30 .... ", and change it to " .... 30 30 0D 0A 30 30 .... ".

190129 edit: Besides, to ensure that my problem can be reproduced, I uploaded my data file to GitHub at URL (should download & unzip it before using; in a binary \\ hex editor you can 0D 0D 0A in the first line): https://github.com/katyusza/hello_world/blob/master/ram_init.zip

I used the following Perl script to remove the extra Carriage Return, but to my astonishment my regex just do NOT work!! My entire code is ( 190129 edit : past entire Perl script here):

use warnings            ;
use strict              ;
use File::Basename      ;

#-----------------------------------------------------------
# command line handling, file open \ create
#-----------------------------------------------------------

# Capture input input filename from command line:
my $input_fn = $ARGV[0] or
die "Should provide input file name at command line!\n";

# Parse input file name, and generate output file name:
my ($iname, $ipath, $isuffix) = fileparse($input_fn, qr/\.[^.]*/);
my $output_fn = $iname."_pruneNonPrintable".$isuffix;

# Open input file:
open (my $FIN, "<", $input_fn) or die "Open file error $!\n";

# Create output file:
open (my $FO, ">", $output_fn) or die "Create file error $!\n";


#-----------------------------------------------------------
# Read input file, search & replace, write to output
#-----------------------------------------------------------

# Read all lines in one go:
$/ = undef;

# Read entire file into variable:
my $prune_txt = <$FIN> ;

# Do match & replace:
 $prune_txt =~ s/\x0D\x0D/\x0D/g;          # do NOT work.
# $prune_txt =~ s/\x0d\x0d/\x30/g;          # do NOT work.
# $prune_txt =~ s/\x30\x0d/\x0d/g;          # can work.
# $prune_txt =~ s/\x0d\x0d\x0a/\x0d\x0a/gs; # do NOT work.

# Print end time of processing:
print $FO $prune_txt  ;

# Close files:
close($FIN)     ;
close($FO)      ;

I did everything I could to match two consecutive Carriage Returns, but failed. Can anyone please point out my mistake, or tell me the right way to go? Thanks in advance!

On Windows, file handles have a :crlf layer given to them by default.

  • This layer converts CR LF to LF on read.
  • This layer converts LF to CR LF on write.

Solution 1: Compensate for the :crlf layer.

You'd use this solution if you want to end up with system-appropriate line endings.

# ... read ...      # CR CR LF ⇒ CR LF
s/\r+\n/\n/g;       # CR LF    ⇒ LF
# ... write ...     # LF       ⇒ CR LF

Solution 2: Remove the :crlf layer.

You'd use this solution if you want to end up with CR LF unconditionally.

Use <:raw and >:raw instead of < and > as the mode.

# ... read ...      # CR CR LF ⇒ CR CR LF
s/\r*\n/\r\n/g;     # CR CR LF ⇒ CR LF
# ... write ...     # CR LF    ⇒ CR LF

The first of your regexes appears to work fine for me, which means that there may be an issue in some other piece of code. Please provide a Minimal, Complete, and Verifiable Example , which means including sample input data and so on.

$ perl -wMstrict -e 'print "Foo\r\r\nBar\r\r\n"' >test.txt
$ hexdump -C test.txt 
00000000  46 6f 6f 0d 0d 0a 42 61  72 0d 0d 0a              |Foo...Bar...|
0000000c
$ cat test.pl 
#!/usr/bin/env perl
use warnings;
use strict;
use Data::Dump;

my $filename = 'test.txt';
open my $fh, '<:raw:encoding(ASCII)', $filename or die "$filename: $!";
my $prune_txt = do { local $/; <$fh> }; # slurp file
close $fh;

dd $prune_txt;
$prune_txt =~ s/\x0D\x0D/\x0D/g;
dd $prune_txt;

$ perl test.pl
"Foo\r\r\nBar\r\r\n"
"Foo\r\nBar\r\n"

By the way, it's not immediately obvious to me which encoding your file is using? In the above example, you may need to adjust the :encoding(...) layer appropriately.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM