简体   繁体   English

在压缩文件上使用sed

[英]Using sed on a compressed file

I have written a file processing program and now it needs to read from a zipped file(.gz unzipped file may get as large as 2TB), 我已经编写了一个文件处理程序,现在它需要从一个压缩文件中读取(.gz解压缩的文件可能会变成2TB),

Is there a sed equivalent for zipped files like (zcat/cat) or else what would be the best approach to do the following efficiently 是否有sed等效文件(zcat / cat)之类的压缩文件,否则有效执行以下操作的最佳方法是什么?

    ONE=`zcat filename.gz| sed -n $counts`

$counts : counter to read(line by line) $ counts:计数器读取(逐行)

The above method works, but is quite slow for large file as I need to read each line and perform the matching on certain fields. 上面的方法有效,但是对于大文件来说相当慢,因为我需要读取每一行并在某些字段上执行匹配。

Thanks 谢谢

EDIT 编辑

Though not directly helpful, here are a set of zcommands 尽管没有直接帮助,但这里有一组zcommands

http://www.cyberciti.biz/tips/decompress-and-expand-text-files.html http://www.cyberciti.biz/tips/decompress-and-expand-text-files.html

Well you either can have more speed (ie use uncompressed files) or more free space (ie use compressed files and the pipe you showed)... sorry. 好吧,您可以具有更高的速度(例如,使用未压缩的文件)或更多的可用空间(即,使用压缩的文件和显示的管道)...对不起。 Using compressed files will always have an overhead. 使用压缩文件总会有开销。

If you understand the internal structure of the compression format it is possible that you could write a pattern matcher that can operate on compressed data without fully decompressing it, but instead by simply determining from the compressed data if the pattern would be present in a given piece of decompressed data. 如果您了解压缩格式的内部结构,则可以编写一个模式匹配器,该模式匹配器可以在不完全解压缩的情况下对压缩数据进行操作,而是通过简单地从压缩数据确定模式是否存在于给定片段中来解压缩的数据。

If the pattern has any complexity at all this sounds like quite a complicated project as you'd have to handle cases where the pattern could be satisfied by the combination of output from two (or more) separate pieces of decompression. 如果模式根本没有任何复杂性,这听起来像是一个相当复杂的项目,因为您将不得不处理两个(或更多)单独的解压输出的组合可以满足该模式的情况。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM