简体   繁体   中英

Identify if a file is a gzip file


I need to check using C++ if a file being opened is a gzip file or not.
In Python I use the following code to identify if a file is gzipped:
 test_file = "junk.txt.gz" with open(test_file, "rb") as f: f_read_first_two_bytes = f.read(2) if f_read_first_two_bytes==b'\x1f\x8b': print("The file is a gzipped file", end='\n')

What is the equivalent in C++?

I am new to C++ and tried the following but that obviously is not the right way.

 int main() { char p[3] = {0}; p[2] = '\n'; // open the junk.txt.gz file. We do not want to just go by the '.gz' in the file name. // but want to check just like the way we did in the Python code. ifstream is("./junk.txt.gz", std::ios::in|std::ios::out|std::ios::binary); //read two characters into p is.read(p,2); cout << std::hex << p[0] << " " << std::hex << p[1] << endl; return 0; }

but that obviously is not the right way.

Well obviously not, since you don't compare the bytes with anything. Otherwise, it pretty much is "right" as much as the Python program is.

A simple way to do the comparison is to interpret the bytes as unsigned char :

auto up = reinterpret_cast<unsigned char*>(p);
if (up[0] == 0x1f && up[1] == 0x8b)

PS This is not necessarily the most accurate test for gzip files. It can have false positives.

I recommend not attempting to implement the test manually. There are open source libraries for this purpose (like there are for most purposes).

Instead of implementing the fingerprint checking yourself you could install the libmagic library which contains a lot of fingerprints for different file types.

Ubuntu: apt install libmagic-dev , Fedora: dnf install file-devel - or download the source from https://github.com/file/file

An example program that checks the files you give on the command line:

#include <magic.h>

#include <filesystem>
#include <iostream>
#include <memory>
#include <stdexcept>
#include <string>

// A small class to manage the libmagic "cookie"
class MagicCookie {
public:
    // MAGIC_MIME - Return a MIME type string, instead of a textual description.
    MagicCookie() : cookie(magic_open(MAGIC_MIME), &magic_close) {
        if(not cookie)
            throw std::runtime_error("unable to initialize magic library");

        if(magic_load(cookie.get(), nullptr)) {
            throw std::runtime_error(std::string("cannot load magic database: ") +
                                     magic_error(cookie.get()));
        }
    }

    // A function that checks a file and returns its MIME type
    const char* File(const std::filesystem::path& file) {
        return magic_file(cookie.get(), file.string().c_str());
    }

private:
    std::unique_ptr<std::remove_pointer_t<magic_t>, decltype(&magic_close)> cookie;
};

int main(int argc, char* argv[]) {
    MagicCookie mc;

    // Find the MIME type for all files given on the command line:
    for(int idx = 1; idx < argc; ++idx) {
        std::cout << argv[idx] << ": MIME: " << mc.File(argv[idx]) << '\n';
    }
}

A gzip ed file will show up with its MIME type application/gzip; charset=binary application/gzip; charset=binary so you can compare with that easily:

some_file.gz: MIME: application/gzip; charset=binary

Alternative modes for opening if MIME isn't what you want can be found here: https://man7.org/linux/man-pages/man3/libmagic.3.html . It can even analyze the content of compressed files if that's needed.

Compile with:

-std=c++17 -lmagic

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM