简体   繁体   English

如何使用python从gzip文件中读取特定行

[英]How to read a specific line from a gzip file with python

I have a big gzip file (11GB) and I want to print as fast as possible the line that I want with Python.我有一个很大的 gzip 文件 (11GB),我想用 Python 尽可能快地打印出我想要的行。 I have tried to do it with linecache.getline() , but as the own function open the file, you are not able to open it with gzip .我试图用linecache.getline()来做,但是当自己的函数打开文件时,你不能用gzip打开它。

linecache expects to get a textfile. linecache期望获得一个文本文件。 A file that has been compressed using gzip is not a textfile.使用gzip压缩的文件不是文本文件。 To do what you want requires two steps.做你想做的事需要两个步骤。 (1) Unzip the file so that you have a textfile. (1) 解压文件,得到一个文本文件。 (2) Use linecache on the textfile. (2) 在文本文件上使用linecache You can do both of those things in Python, but only one after the other.你可以在 Python 中做这两件事,但只能一个接一个。

I understand that you want to get at a specific line without having to decompress then entire zipfile.我知道您希望获得特定行而不必解压缩整个 zipfile。 But that is not how zipfile compression works.但这不是 zipfile 压缩的工作方式。 There is unlikely to be anything in the compressed data that corresponds to the notion of a line of text.压缩数据中不可能有任何内容与一行文本的概念相对应。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM