简体   繁体   English

如何使用Perl逐行读取CR-only文件?

[英]How to read line by line a CR-only file with Perl?

I'm trying to read a file which has only CR as line delimiter. 我正在尝试读取只有CR作为行分隔符的文件。 I'm using Mac OS X and Perl v.5.8.8. 我正在使用Mac OS X和Perl v.5.8.8。 This script should run on every platform, for every kind of line delimiter (CR, LF, CRLF). 对于每种行分隔符(CR,LF,CRLF),此脚本应在每个平台上运行。

My current code is the following : 我目前的代码如下:

open(FILE, "test.txt");

while($record = <FILE>){
    print $record;
}

close(TEST);

This currently print only the last line (or worst). 目前只打印最后一行(或最差)。 What is going on? 到底是怎么回事? Obvisously, I would like to not convert the file. 很明显,我想不转换文件。 Is it possible? 可能吗?

You can set the delimiter using the special variable $/ : 您可以使用特殊变量$/设置分隔符:

local $/ = "\r" # CR, use "\r\n" for CRLF or "\n" for LF
my $line = <FILE>;

See perldoc perlvar for further information. 有关详细信息,请参阅perldoc perlvar

Another solution that works with all kinds of linebreaks would be to slurp the whole file at once and then split it into lines using a regex: 另一种适用于各种换行符的解决方案是立即对整个文件进行啜食,然后使用正则表达式将其拆分为多行:

local $/ = undef;
my $content = <FILE>;
my @lines = split /\r\n|\n|\r/, $content;

You shouldn't do that with very large files though, as the file is read into memory completely. 但是,对于非常大的文件,您不应该这样做,因为文件完全被读入内存。 Note that setting $/ to the undefined value disables the line delimiter, meaning that everything is read until the end of the file. 请注意,将$ /设置为未定义的值会禁用行分隔符,这意味着在文件结束之前一直读取所有内容。

I solved a more general problem that could be useful here: 我解决了一个可能在这里有用的更普遍的问题:

How to parse big file line-by-line with any line delimiter (CR/CRLF/LF), but unknown beforehand. 如何逐行解析大文件与任何行分隔符(CR / CRLF / LF),但事先未知。

'Big' file means that it is not ok to read the whole file into one variable. “大”文件意味着将整个文件读入一个变量是不可行的。 Here function 'detectEndOfLine' gets name of file and returns either '\\r' or '\\n', whatever is used for line ending (it searched for '\\r' or '\\n' symbol char-by-char starting from the end of the file). 函数'detectEndOfLine'获取文件名,并返回'\\ r'或'\\ n',无论用于行结尾(它搜索'\\ r'或'\\ n'符号char-by-char从文件的结尾)。

my $file = "test.txt";
local $/ = detectEndOfLine($file);
open(IN, $file) or die "Can't open file \"$file\" for reading: $!\n";
while(<IN>) {
    s/\r\n|\n|\r$//;
    print "$_\n";
}

sub detectEndOfLine {
    my $file = $_[0];
    my $size = -s $file;
    print "\"$size\"\n";

    open(IN, $file) or die "Can't open file \"$file\" for reading: $!\n";
    for(my $i = $size; $i >= 0; --$i) {
        seek(IN, $i, 0);
        $_ = <IN>;
        my $sym = substr($_, 0, 1);
        return $sym if( $sym eq "\n" or $sym eq "\r" );
    }
    return undef;
}

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM