简体   繁体   English

识别文件是否为 gzip 文件

[英]Identify if a file is a gzip file


I need to check using C++ if a file being opened is a gzip file or not. 如果打开的文件是 gzip 文件,我需要使用 C++ 检查。
In Python I use the following code to identify if a file is gzipped: 在 Python 中,我使用以下代码来识别文件是否被压缩:
 test_file = "junk.txt.gz" with open(test_file, "rb") as f: f_read_first_two_bytes = f.read(2) if f_read_first_two_bytes==b'\x1f\x8b': print("The file is a gzipped file", end='\n')

What is the equivalent in C++? C++ 中的等价物是什么?

I am new to C++ and tried the following but that obviously is not the right way.我是 C++ 的新手并尝试了以下方法,但这显然不是正确的方法。

 int main() { char p[3] = {0}; p[2] = '\n'; // open the junk.txt.gz file. We do not want to just go by the '.gz' in the file name. // but want to check just like the way we did in the Python code. ifstream is("./junk.txt.gz", std::ios::in|std::ios::out|std::ios::binary); //read two characters into p is.read(p,2); cout << std::hex << p[0] << " " << std::hex << p[1] << endl; return 0; }

but that obviously is not the right way.但这显然不是正确的方法。

Well obviously not, since you don't compare the bytes with anything.显然不是,因为您不会将字节与任何内容进行比较。 Otherwise, it pretty much is "right" as much as the Python program is.否则,它几乎与 Python 程序一样“正确”。

A simple way to do the comparison is to interpret the bytes as unsigned char :进行比较的一种简单方法是将字节解释为unsigned char

auto up = reinterpret_cast<unsigned char*>(p);
if (up[0] == 0x1f && up[1] == 0x8b)

PS This is not necessarily the most accurate test for gzip files. PS 这不一定是对 gzip 文件最准确的测试。 It can have false positives.它可能有误报。

I recommend not attempting to implement the test manually.我建议不要尝试手动实施测试。 There are open source libraries for this purpose (like there are for most purposes).有用于此目的的开源库(就像大多数目的一样)。

Instead of implementing the fingerprint checking yourself you could install the libmagic library which contains a lot of fingerprints for different file types.您可以安装libmagic库,而不是自己实施指纹检查,该库包含许多针对不同文件类型的指纹。

Ubuntu: apt install libmagic-dev , Fedora: dnf install file-devel - or download the source from https://github.com/file/file Ubuntu: apt install libmagic-dev ,Fedora: dnf install file-devel - 或从https://github.com/file/file下载源代码

An example program that checks the files you give on the command line:检查您在命令行中提供的文件的示例程序:

#include <magic.h>

#include <filesystem>
#include <iostream>
#include <memory>
#include <stdexcept>
#include <string>

// A small class to manage the libmagic "cookie"
class MagicCookie {
public:
    // MAGIC_MIME - Return a MIME type string, instead of a textual description.
    MagicCookie() : cookie(magic_open(MAGIC_MIME), &magic_close) {
        if(not cookie)
            throw std::runtime_error("unable to initialize magic library");

        if(magic_load(cookie.get(), nullptr)) {
            throw std::runtime_error(std::string("cannot load magic database: ") +
                                     magic_error(cookie.get()));
        }
    }

    // A function that checks a file and returns its MIME type
    const char* File(const std::filesystem::path& file) {
        return magic_file(cookie.get(), file.string().c_str());
    }

private:
    std::unique_ptr<std::remove_pointer_t<magic_t>, decltype(&magic_close)> cookie;
};

int main(int argc, char* argv[]) {
    MagicCookie mc;

    // Find the MIME type for all files given on the command line:
    for(int idx = 1; idx < argc; ++idx) {
        std::cout << argv[idx] << ": MIME: " << mc.File(argv[idx]) << '\n';
    }
}

A gzip ed file will show up with its MIME type application/gzip; charset=binary一个gzip ed 文件将显示其 MIME 类型application/gzip; charset=binary application/gzip; charset=binary so you can compare with that easily: application/gzip; charset=binary所以你可以很容易地与之比较:

some_file.gz: MIME: application/gzip; charset=binary

Alternative modes for opening if MIME isn't what you want can be found here: https://man7.org/linux/man-pages/man3/libmagic.3.html .如果MIME不是您想要的,则可以在此处找到其他打开模式: https://man7.org/linux/man-pages/man3/libmagic.3.html It can even analyze the content of compressed files if that's needed.如果需要,它甚至可以分析压缩文件的内容。

Compile with:编译:

-std=c++17 -lmagic

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM