简体   繁体   中英

Read compressed stdin

I would like to have such call:

pv -ptebar compressed.csv.gz | python my_script.py

Inside my_script.py I would like to decompress compressed.csv.gz and parse it using Python csv parser. I would expect something like this:

import csv
import gzip
import sys


with gzip.open(fileobj=sys.stdin, mode='rt') as f:
    reader = csv.reader(f)
    print(next(reader))
    print(next(reader))
    print(next(reader))

Of course it doesn't work because gzip.open doesn't have fileobj argument. Could you provide some working example solving this issue?

UPDATE

Traceback (most recent call last):
  File "my_script.py", line 8, in <module>
    print(next(reader))
  File "/usr/lib/python3.5/gzip.py", line 287, in read1
    return self._buffer.read1(size)
  File "/usr/lib/python3.5/_compression.py", line 68, in readinto
    data = self.read(len(byte_view))
  File "/usr/lib/python3.5/gzip.py", line 461, in read
    if not self._read_gzip_header():
  File "/usr/lib/python3.5/gzip.py", line 404, in _read_gzip_header
    magic = self._fp.read(2)
  File "/usr/lib/python3.5/gzip.py", line 91, in read
    self.file.read(size-self._length+read)
  File "/usr/lib/python3.5/codecs.py", line 321, in decode
    (result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x8b in position 1: invalid start byte

The traceback above appeared after applying @Rawing advice.

In python 3.3+, you can pass a file object to gzip.open :

The filename argument can be an actual filename (a str or bytes object), or an existing file object to read from or write to.

So your code should work if you just omit the fileobj= :

with gzip.open(sys.stdin, mode='rt') as f:

Or, a slightly more efficient solution:

with gzip.open(sys.stdin.buffer, mode='rb') as f:

If for some odd reason you're using a python older than 3.3, you can directly invoke the gzip.GzipFile constructor . However, these old versions of the gzip module didn't have support for files opened in text mode, so we'll use sys.stdin 's underlying buffer instead:

with gzip.GzipFile(fileobj=sys.stdin.buffer) as f:

使用gzip.open(sys.stdin.buffer, 'rt')修复了Python 3的问题。

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM