简体   繁体   中英

gzip unexpected end of file

I can check the integrity of a gzip file with gzip -t file.gz and zcat file.gz > /dev/null as per previous answers .

Sometimes I have jobs dying before a compression of a large file finishes. I will get an error about unexpected end of file, if I check the file from beginning to end. But is it possible to only test, that there is no unexpected end of the compressed file, so I don't have to read through the entire file?

EDIT 2018 in accordance with answer from Mark Adler below (Python 3.2+ solution):

import os
import string
import gzip

with gzip.open('test.gz', 'wt') as f:
    f.write(string.ascii_lowercase)

with open('test.gz', 'rb') as f:
    f.seek(-4 , os.SEEK_END)
    length = int.from_bytes(f.read(), byteorder='little')
    assert length == 26
    print('Thanks Mark Adler!') 
    print('The English alphabet has {length} letters.'.format(length=length))

No, there is not. You would need to decompress all the way through to see if deflate compressed data terminates properly, and that it is followed by a 32-bit CRC and the uncompressed data length modulo 2 32 .

If you happen to know the length of the uncompressed data, or know some constraints on the length, then you can check the last four bytes of the gzip file to see if it matches or meets the constraint. If it does not agree, then you know that the gzip file didn't finish. If it does agree, then you can only only conclude that it is probably ok. (There is some possibility that the stream happened to terminate early with the last four bytes meeting the constraint by accident.)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM