简体   繁体   English

如何使用Perl“跳转”到文件的一行,而不是逐行读取文件

[英]How to “jump” to a line of a file, rather than read file line by line, using Perl

I am opening a file containing a single but very long column. 我正在打开一个包含单个但很长的列的文件。 I want to retrieve from it just a short segment, starting at a specified line and ending at another specified line. 我想从中检索一小段,从指定的行开始到另一指定的行结束。 Currently, my script is reading the file line by line until the desired lines are found. 当前,我的脚本正在逐行读取文件,直到找到所需的行。 I am using: 我在用:

my ( $from, $to ) = ( some line number, some larger line number );    
my $count = 1;
my @seq = ();

while ( <SEQUENCE> ) {
    print "$_ for $count\n";
    $count++;

    while ( $count >= $from && $count <= $to ) {
         push( @seq, $_ );
         last;
    }
}
print "seq is: @seq\n";

Input looks like: 输入看起来像:

A
G
T
C
A
G
T
C
.
.
.

How might I "jump" to where I want to be? 我该如何“跳”到想要的位置?

You'll need to use seek to move to the correct portion of the file. 您将需要使用seek移动到文件的正确部分。 ref: http://perldoc.perl.org/functions/seek.html 参考: http : //perldoc.perl.org/functions/seek.html

This works on bytes, not on lines, so generally if you need to use line seeking its not an option. 这适用于字节,而不适用于行,因此通常,如果您需要使用行查找而不是选择行,则通常如此。 However, since you're working on a fixed length line (2 or 3 bytes depending on your platform's EOL encoding) you can multiply the line length by the line you want (0 indexed) and you'll be at the correct location for reading. 但是,由于您使用的是固定长度的行(2或3个字节,具体取决于平台的EOL编码),因此您可以将行长乘以所需的行(索引为0),这样便可以在正确的位置进行读取。

If you happen to know that all the lines are of exactly the same length (accounting for line ending characters, generally 1 byte on Unix/Linux and 2 on Windows), you can use seek to go directly to a specified point in the file 如果您碰巧知道所有行的长度完全相同(占行尾字符,在Unix / Linux上通常为1个字节,在Windows上通常为2个字节),则可以使用seek直接转到文件中的指定点

The seek function lets you specify a file position in bytes/characters, not in lines. seek功能使您可以字节/字符而不是行指定文件位置。 In the general case, the only way to go to a specified line number is to read from the beginning and skip that many lines (minus one). 通常,转到指定行号的唯一方法是从头开始读取并跳过那么多行(减一)。

Unless you have an index mapping line numbers to byte offsets; 除非您有一个索引将行号映射到字节偏移量;否则, then you can look up the specified line number in the index and use seek to jump to that location. 那么您可以在索引中查找指定的行号,并使用seek跳到该位置。 To do this, you have to build the index separately (a process that will require reading through the entire file) and make sure the index is always up to date. 为此,您必须单独构建索引(此过程将需要读取整个文件),并确保索引始终是最新的。 If the file changes frequently, this is likely to be impractical. 如果文件频繁更改,则可能不切实际。

I'm not aware of any existing tools for building and using such an index, but I wouldn't be surprised if they exist. 我不了解用于构建和使用这样的索引的任何现有工具,但是如果它们存在,我也不会感到惊讶。 But it should be easy enough to roll your own. 但是,滚动自己的角色应该足够容易。

But unless scanning the file to find the line number you want is a significant performance bottleneck, I wouldn't bother with the extra complexity. 但是除非扫描文件以找到所需的行号是一个明显的性能瓶颈,否则我不会为额外的复杂性而烦恼。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM