简体   繁体   English

如何从 Perl 中的文件末尾读取行?

[英]How can I read lines from the end of file in Perl?

I am working on a Perl script to read CSV file and do some calculations.我正在编写 Perl 脚本来读取 CSV 文件并进行一些计算。 CSV file has only two columns, something like below. CSV 文件只有两列,如下所示。

One Two
1.00 44.000
3.00 55.000

Now this CSV file is very big ,can be from 10 MB to 2GB.现在这个 CSV 文件非常大,可以从 10 MB 到 2GB。

Currently I am taking CSV file of size 700 MB.目前我正在使用 700 MB 大小的 CSV 文件。 I tried to open this file in notepad, excel but it looks like no software is going to open it.我试图在记事本、excel 中打开这个文件,但看起来没有软件会打开它。

I want to read may be last 1000 lines from CSV file and see the values.我想从 CSV 文件中读取可能是最后 1000 行并查看值。 How can I do that?我怎样才能做到这一点? I cannot open file in notepad or any other program.我无法在记事本或任何其他程序中打开文件。

If I write a Perl script then I need to process complete file to go to end of file and then read last 1000 lines.如果我编写一个 Perl 脚本,那么我需要处理完整的文件以转到文件末尾,然后读取最后 1000 行。

Is there any better way to that?有没有更好的方法呢? I am new to Perl and any suggestions will be appreciated.我是 Perl 的新手,任何建议将不胜感激。

I have searched net and there are some scripts available like File::Tail but I don't know they will work on windows ?我在网上搜索过,有一些脚本可用,例如File::Tail但我不知道它们可以在 Windows 上运行吗?

The File::ReadBackwards module allows you to read a file in reverse order. File::ReadBackwards模块允许您以相反的顺序读取文件。 This makes it easy to get the last N lines as long as you aren't order dependent.只要您不依赖于顺序,就可以轻松获取最后 N 行。 If you are and the needed data is small enough (which it should be in your case) you could read the last 1000 lines into an array and then reverse it.如果您是并且所需的数据足够小(在您的情况下应该是这样),您可以将最后 1000 行读入一个数组,然后将其reverse

In *nix, you can use the tail command.在 *nix 中,您可以使用 tail 命令。

tail -1000 yourfile | perl ...

That will write only the last 1000 lines to the perl program.这只会将最后 1000 行写入 perl 程序。

On Windows, there are gnuwin32 and unxutils packages both have tail utility.在Windows上,还有的GnuWin32unxutils包都有tail效用。

This is only tangentially related to your main question, but when you want to check if a module such as File::Tail works on your platform, check the results from CPAN Testers .这仅与您的主要问题密切相关,但是当您想检查诸如File::Tail 之类的模块是否在您的平台上工作时,请检查CPAN Testers的结果。 The links at the top of the module page in CPAN Search lead you to CPAN 搜索模块页面顶部的链接将引导您到

文件尾标头
(source: flickr.com ) (来源: flickr.com

Looking at the matrix, you see that indeed this module has a problem on Windows on all version of Perl tested:查看矩阵,您会发现该模块确实在所有已测试 Perl 版本的 Windows 上存在问题:

文件尾矩阵
(source: flickr.com ) (来源: flickr.com

Without tail, a Perl-only solution isn't that unreasonable.没有tail,仅Perl 的解决方案并不是那么不合理。

One way is to seek from the end of the file, then read lines from it.一种方法是从文件末尾查找,然后从中读取行。 If you don't have enough lines, seek even further from the end and try again.如果您没有足够的线条,请从结尾处进一步寻找,然后再试一次。

sub last_x_lines {
    my ($filename, $lineswanted) = @_;
    my ($line, $filesize, $seekpos, $numread, @lines);

    open F, $filename or die "Can't read $filename: $!\n";

    $filesize = -s $filename;
    $seekpos = 50 * $lineswanted;
    $numread = 0;

    while ($numread < $lineswanted) {
        @lines = ();
        $numread = 0;
        seek(F, $filesize - $seekpos, 0);
        <F> if $seekpos < $filesize; # Discard probably fragmentary line
        while (defined($line = <F>)) {
            push @lines, $line;
            shift @lines if ++$numread > $lineswanted;
        }
        if ($numread < $lineswanted) {
            # We didn't get enough lines. Double the amount of space to read from next time.
            if ($seekpos >= $filesize) {
                die "There aren't even $lineswanted lines in $filename - I got $numread\n";
            }
            $seekpos *= 2;
            $seekpos = $filesize if $seekpos >= $filesize;
        }
    }
    close F;
    return @lines;
}

PS A better title would be something like "Reading lines from the end of a large file in Perl". PS 更好的标题应该是“在 Perl 中从大文件的末尾读取行”。

I've wrote quick backward file search using the following code on pure Perl:我在纯 Perl 上使用以下代码编写了快速反向文件搜索:

#!/usr/bin/perl 
use warnings;
use strict;
my ($file, $num_of_lines) = @ARGV;

my $count = 0;
my $filesize = -s $file; # filesize used to control reaching the start of file while reading it backward
my $offset = -2; # skip two last characters: \n and ^Z in the end of file

open F, $file or die "Can't read $file: $!\n";

while (abs($offset) < $filesize) {
    my $line = "";
    # we need to check the start of the file for seek in mode "2" 
    # as it continues to output data in revers order even when out of file range reached
    while (abs($offset) < $filesize) {
        seek F, $offset, 2;     # because of negative $offset & "2" - it will seek backward
        $offset -= 1;           # move back the counter
        my $char = getc F;
        last if $char eq "\n"; # catch the whole line if reached
        $line = $char . $line; # otherwise we have next character for current line
    }

    # got the next line!
    print $line, "\n";

    # exit the loop if we are done
    $count++;
    last if $count > $num_of_lines;
}

and run this script like:并运行此脚本,如:

$ get-x-lines-from-end.pl ./myhugefile.log 200
perl -n -e "shift @d if (@d >= 1000); push(@d, $_); END { print @d }" < bigfile.csv

虽然实际上,UNIX 系统可以简单地使用tail -n 1000的事实应该说服您简单地安装cygwincolinux

You could use Tie::File module I believe.我相信你可以使用 Tie::File 模块。 It looks like this loads the lines into an array, then you could get the size of the array and process arrayS-ze-1000 up to arraySize-1.看起来这将行加载到数组中,然后您可以获得数组的大小并将 arrayS-ze-1000 处理为 arraySize-1。

Tie::File领带::文件

Another Option would be to count the number of lines in the file, then loop through the file once, and start reading in values at numberofLines-1000另一种选择是计算文件中的行数,然后遍历文件一次,并开始读取 numberofLines-1000 处的值

$count = `wc -l < $file`;
die "wc failed: $?" if $?;
chomp($count);

That would give you number of lines (on most systems.这将为您提供行数(在大多数系统上。

The modules are the way to go.模块是要走的路。 However, sometimes you may be writing a piece of code that you want to run on a variety of machines that may be missing the more obscure CPAN modules.但是,有时您可能正在编写一段代码,希望在可能缺少更晦涩的 CPAN 模块的各种机器上运行。 In that case why not just 'tail' and dump the output to a temp file from within Perl?在那种情况下,为什么不只是“tail”并将输出转储到 Perl 中的临时文件?

#!/usr/bin/perl

`tail --lines=1000 /path/myfile.txt > tempfile.txt`

You then have something that isn't dependent on a CPAN module if installing one may present an issue.如果安装一个模块可能会出现问题,那么您将拥有不依赖于 CPAN 模块的东西。

If you know the number of lines in the file, you can do如果您知道文件中的行数,则可以执行

perl -ne "print if ($. > N);" filename.csv

where N is $num_lines_in_file - $num_lines_to_print.其中 N 是 $num_lines_in_file - $num_lines_to_print。 You can count the lines with你可以用

perl -e "while (<>) {} print $.;" filename.csv

Without relying on tail, which I probably would do, if you have more than $FILESIZE [2GB?] of memory then I'd just be lazy and do:不依赖tail,我可能会这样做,如果你有超过$FILESIZE [2GB?] 的内存,那么我只是懒惰而做:

my @lines = <>;
my @lastKlines = @lines[-1000,-1];

Though the other answers involving虽然其他答案涉及tail尾巴or seek() are pretty much the way to go on this.seek()几乎是这样做的方法。

You should absolutely use File::Tail, or better yet another module.你绝对应该使用 File::Tail,或者更好的另一个模块。 It's not a script, it's a module (programming library).它不是脚本,而是模块(编程库)。 It likely works on Windows.它可能适用于 Windows。 As somebody said, you can check this on CPAN Testers, or often just by reading the module documentation or just trying it.正如有人所说,您可以在 CPAN Testers 上检查这一点,或者通常只是通过阅读模块文档或只是尝试一下。

You selected usage of the tail utility as your preferred answer, but that's likely to be more of a headache on Windows than File::Tail.您选择使用 tail 实用程序作为首选答案,但这在 Windows 上可能比 File::Tail 更令人头疼。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM