简体   繁体   中英

How is the gzip file size encoded?

The gzip file format contains the (uncompressed/original) file size encoded in the last 4 bytes of the compressed file. The "gzip -l" command reports the compressed and uncompressed sizes, the compression ratio, the original filename.

Looking around stackoverflow, there are a couple of mentions of decoding the size encoded in the last 4 bytes.

What is the encoding of the size? Big-endian (most significant byte first), Little-endian (least significant byte first), and is the value signed or unsigned?

This code snippet seems to be working for me,

FILE* fh; //assume file handle opened
unsigned char szbuf[4];
struct stat statbuf;
fstat(fn,&statbuf);
unsigned long clen=statbuf.st_size;
fseek(fh,clen-4,SEEK_SET);
int count=fread(szbuf,1,4,fh);
unsigned long ulen = ((((((szbuf[4-1] << 8) | szbuf[3-1]) << 8) | szbuf[2-1]) << 8) | szbuf[1-1]);

Here are a couple of related posts, which seem to imply little-endian, and unsigned long (0..4GB-1).

Determine uncompressed size of GZIP file

GZIPOutputStream not updating Gzip size bytes

Determine size of file in gzip

Gzip.org has more information about Gzip

RFC says it's modulo 2^32 which means uint32_t , and experimentation using a .Net GZipStream gives it as little-endian.

RFC 1952

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM