gzip unexpected end of file

Question

I can check the integrity of a gzip file with gzip -t file.gz and zcat file.gz > /dev/null as per previous answers .

Sometimes I have jobs dying before a compression of a large file finishes. I will get an error about unexpected end of file, if I check the file from beginning to end. But is it possible to only test, that there is no unexpected end of the compressed file, so I don't have to read through the entire file?

EDIT 2018 in accordance with answer from Mark Adler below (Python 3.2+ solution):

import os
import string
import gzip

with gzip.open('test.gz', 'wt') as f:
    f.write(string.ascii_lowercase)

with open('test.gz', 'rb') as f:
    f.seek(-4 , os.SEEK_END)
    length = int.from_bytes(f.read(), byteorder='little')
    assert length == 26
    print('Thanks Mark Adler!') 
    print('The English alphabet has {length} letters.'.format(length=length))

Answer 1

No, there is not. You would need to decompress all the way through to see if deflate compressed data terminates properly, and that it is followed by a 32-bit CRC and the uncompressed data length modulo 2 ³² .

If you happen to know the length of the uncompressed data, or know some constraints on the length, then you can check the last four bytes of the gzip file to see if it matches or meets the constraint. If it does not agree, then you know that the gzip file didn't finish. If it does agree, then you can only only conclude that it is probably ok. (There is some possibility that the stream happened to terminate early with the last four bytes meeting the constraint by accident.)

gzip unexpected end of file

Question

1 answers

solution1
2 ACCPTED 2015-10-08 20:42:37

gzip unexpected end of file

Question

1 answers

solution1 2 ACCPTED 2015-10-08 20:42:37

solution1
2 ACCPTED 2015-10-08 20:42:37