简体   繁体   中英

trying to understand this s/\r?\n$// perl regex

to all perl gurus! I have the following snip of code and there is a specific line that I am trying to understand. Been reading around and manage to understand that it's a perl regex. But, I haven't been able to understand what each one is going. correct me if I am wrong for what I am about to put.

this particular part is for reading the EDID content from a files which are in HEX. I believe what the previous guy was trying to do is take out any spaces and next lines. But not completely sure.

for (my $int=1;$int<9;$int++){
my $line = <$info>;
$line =~ s/\r?\n$//;
chomp $line;
$line =~ s/\s+//g;
if ( $line eq "00000000000000000000000000000000" ){ 
    print "bad EDID information in file $file --- all 0's\r\n";
    close $info;
    close $OUTFILE;
    exit 1;
}


print $OUTFILE $line
}

now, this part is the one that throws me off.

$line =~ s/\r?\n$//;

what I want to understand is... what is s/ \\r? \\n $// is doing. I believe \\n is next line. But not sure about the other ones. Any comment or help is always welcome.

In case you do not already know, s/// is the substitution operator.

The pattern matches an optional carriage return followed by a newline sequence and the end of the string.

\r?  # '\r' (carriage return) (optional)
\n   # '\n' (newline)
 $   # before an optional \n, and the end of the string

hwnd's answer is factually correct, it does not explain why this regex is there.

Windows and Unix (including OS X) use different ways to express the end of a line . That regex deletes both kinds ensuring it will work no matter which type of machine produced the file or which type is reading it.

Windows and many Internet protocols use carriage return (ASCII 015) and a line feed (ASCII 012); this comes from when computer displays were electric typewriters and had to be told to move the print head (the carriage) back to the first column (carriage return) and then advance a line (line feed) . Unix uses just a line feed (ASCII 012). Carriage return in a regex is \\r or \\015 . Line feed (aka newline) is \\n or \\012 .

The $ is redundant, the newline will be at the end of the line, and should probably be removed.

The call to chomp is redundant. chomp will remove a newline of the type to the current operating system. On Unix it will remove \\n and on Windows it will remove \\r\\n (it will actually remove the value of $/ ). However, if you're working with a Windows file on a Unix machine, or vice versa, it will not adapt to the type of file. The regex is safer.

$line =~ s/\\s+//g; The /g makes it match as many times as possible removing all whitespace anywhere in the line. Since carriage return and newlines are whitespace, this makes both chomp and s/\\r?\\n$// redundant.

All three lines could be reduced to $line =~ s{\\s+}{}g .

Your predecessor has written the equivalent of a chomp that is intended to work on both Windows and Linux text files. The former has CR LF line endings "\\r\\n" and the latter has just LF "\\n" .

A better way to write this, assuming you're not interested in trailing tabs or spaces, would be s/\\s+$// , since both CR and LF are "whitespace".

Better still, if you can guarantee that you are running on version 10 or later of Perl 5 (put use 5.010 at the top of the program) would be s/\\s+\\z// .

Or, if you want to retain trailing spaces but remove the line terminator(s), s/[\\r\\n]+\\z// will do that for you, and will also cope with old-fashioned Mac text files, which have just CR at the end.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM