简体   繁体   English

如何使用Perl快速写入压缩文件?

[英]How can I write compressed files on the fly using Perl?

I am generating relatively large files using Perl. 我正在使用Perl生成相对较大的文件。 The files I am generating are of two kinds: 我生成的文件有两种:

  1. Table files, ie textual files I print line by line (row by row), which contain mainly numbers. 表文件,即文本文件,我逐行(逐行)打印,其中主要包含数字。 A typical line looks like: 典型的行如下所示:

    126891 126991 14545 12

  2. Serialized objects I create then store into a file using Storable::nstore . 我创建的序列化对象然后使用Storable::nstore存储到文件中。 These objects usually contain some large hash with numeric values. 这些对象通常包含一些带有数字值的大型哈希。 The values in the object might have been pack ed to save on space (and the object unpack s each value before using it). 该对象中的值可能已pack以节省空间(并且该对象在使用每个值之前都要对其进行unpack )。

Currently I'm usually doing the following: 目前,我通常会执行以下操作:

use IO::Compress::Gzip qw(gzip $GzipError);

# create normal, uncompressed file ($out_file)
# ...

# compress file using gzip
my $gz_out_file = "$out_file.gz";
gzip $out_file => $gz_out_file or die "gzip failed: $GzipError";

# delete uncompressed file
unlink($out_file) or die "can't unlink file $out_file: $!";

This is quite inefficient since I first write the large file to disk, then gzip read it again and compresses it. 这是非常低效的,因为我先将大文件写入磁盘,然后gzip再次读取并将其压缩。 So my questions are as following: 所以我的问题如下:

  1. Can I create a compressed file without first writing a file to disk? 是否可以在不先将文件写入磁盘的情况下创建压缩文件? Is it possible to create a compressed file sequentially, ie printing line-by-line like in scenario (1) described earlier? 是否可以顺序创建压缩文件,即像前面描述的方案(1)一样逐行打印?

  2. Does Gzip sounds like an appropriate choice? Gzip听起来是否合适? aRe there any other recommended compressors for the kind of data I have described? a是否有其他推荐的压缩器用于我描述的数据类型?

  3. Does it make sense to pack values in an object that will later be stored and compressed anyway? 将值pack在一个对象中是否有意义,该对象以后无论如何都会被存储和压缩?

My considerations are mainly saving on disk space and allowing fast decompression later on. 我的考虑主要是节省磁盘空间并在以后允许快速解压缩。

  1. You can use IO::Zlib or PerlIO::gzip to tie a file handle to compress on the fly. 您可以使用IO::ZlibPerlIO::gzip来绑定要即时压缩的文件句柄。

  2. As for what compressors are appropriate, just try several and see how they do on your data. 至于哪种压缩器合适,只需尝试几个压缩器,看看它们如何处理您的数据。 Also keep an eye on how much CPU/memory they use for compression and decompression. 还要注意它们用于压缩和解压缩的CPU /内存量。

  3. Again, test to see how much pack helps with your data, and how much it affects your performance. 再次,测试一下pack对您的数据有多大帮助,以及它对您的性能有多大影响。 In some cases, it may be helpful. 在某些情况下,这可能会有所帮助。 In others, it may not. 在其他情况下,可能不会。 It really depends on your data. 这实际上取决于您的数据。

You can also open() a filehandle to a scalar instead of a real file, and use this filehandle with IO::Compress::Gzip. 您也可以将文件句柄open()转换为标量而不是真实文件,并将此文件句柄与IO :: Compress :: Gzip一起使用。 Haven't actually tried it, but it should work. 尚未实际尝试过,但应该可以。 I use something similar with Net::FTP to avoid creating files on disk. 我使用与Net :: FTP类似的方法来避免在磁盘上创建文件。

Since v5.8.0, Perl has built using PerlIO by default. 从v5.8.0开始,Perl默认使用PerlIO构建。 Unless you've changed this (ie, Configure -Uuseperlio), you can open filehandles directly to Perl scalars via: 除非您对此进行了更改(即,Configure -Uuseperlio),否则可以通过以下方式直接向Perl标量打开文件句柄:

open($fh, '>', \\$variable) || ..

from open() 来自open()

IO::Compress::Zlib has an OO interface that can be used for this. IO :: Compress :: Zlib具有可用于此目的的OO接口。

use strict;  
use warnings;
use IO::Compress::Gzip;

my $z = IO::Compress::Gzip->new('out.gz');
$z->print($_, "\n") for 0 .. 10;

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM