简体   繁体   中英

CR vs LF perl parsing

I have a perl script which parses a text file and breaks it up per line into an array. It works fine when each line are terminated by LF but when they terminate by CR my script is not handling properly. How can I modify this line to fix this

my @allLines = split(/^/, $entireFile);

edit: My file has a mixture of lines with either ending LF or ending CR it just collapses all lines when its ending in CR

Perl can handle both CRLF and LF line-endings with the built-in :crlf PerlIO layer:

open(my $in, '<:crlf', $filename);

will automatically convert CRLF line endings to LF, and leave LF line endings unchanged. But CR-only files are the odd-man out. If you know that the file uses CR-only, then you can set $/ to "\\r" and it will read line-by-line (but it won't change the CR to a LF).

If you have to deal with files of unknown line endings (or even mixed line endings in a single file), you might want to install the PerlIO::eol module. Then you can say:

open(my $in, '<:raw:eol(LF)', $filename);

and it will automatically convert CR, CRLF, or LF line endings into LF as you read the file.

Another option is to set $/ to undef , which will read the entire file in one slurp. Then split it on /\\r\\n?|\\n/ . But that assumes that the file is small enough to fit in memory.

If you have mixed line endings, you can normalize them by matching a generalized line ending:

 use v5.10;

 $entireFile =~ s/\R/\n/g;

You can also open a filehandle on a string and read lines just like you would from a file:

 open my $fh, '<', \ $entireFile;
 my @lines = <$fh>;
 close $fh;

You can even open the string with the layers that cjm shows .

在进行split ,您可以只处理不同的行结尾,例如:

my @allLines = split(/\r\n|\r|\n/, $entireFile);

It will automatically split the input into lines if you read with <> , but you need to you need to change $/ to \\r .

$/ is the "input record separator". see perldoc perlvar for details.

There is not any way to change what a regular expression considers to be the end-of-line - it's always newline.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM