简体   繁体   中英

How to check if a file is gzip compressed?

I have a C / C++ program which needs to read in a file that may or may not be gzip compressed. I know we can use gzread() from zlib to read in both compressed and uncompressed files - however, I want to use the zlib functions ONLY if the file is gzip compressed (for performance reasons).

So is there any way to programatically detect or check if a certain file is gzipped from C / C++?

There is a magic number at the beginning of the file. Just read the first two bytes and check if they are equal to 0x1f8b .

Do you prefer false positives, false negatives, or no false results at all (there goes performance down the drain...)?

The RFC 1952: GZIP file format specification version 4.3 states the first 2 bytes (of each member and therefore) of the file are '\\x1F' and '\\x8B' . Use that for a first check that can result in false positives.

What is the difference in performance between reading compressed and uncompressed files using gzread()?

Anyway, in order to detect if a file is gzipped, you can read the magic number at the beginning of the file, which is 1f 8b according to the link.

You can test for the signatures described in the RFCs 1951 and 1952 to get an idea. For GZIP files the second one is the relevant and it is definitive. There are some false positives on other formats, so you should check as much of the header for plausible values.

For just zlib streams it's somewhat harder, because they are even more prone to false positives. But you would rarely encounter those in the wild on their own.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM