简体   繁体   中英

MS Office Files are not recognized by comparing signatures

I need to check that whether a file is .doc , .ppt , .pdf or any other. I have written the following code:

bool CheckFile(string path)
{
    char * sig;
    sig = new char[8];
    ifstream myfile;
    myfile.open(path.c_str(), ios::in | ios::binary);
    if (myfile.fail())
    {
        MessageBox(0,"File Not Opened","ERROR",MB_OK);
        break;
    }
    myfile.read(sig,8);

    //docx, pptx, xlsx
    if ((sig[0] == (0x50))&&(sig[1] == (0x4B))&&(sig[2] == (0x03))&&(sig[3] == (0x04))&&(sig[4] == (0x14))&&(sig[5] == (0x00))&&(sig[6] == (0x06))&&(sig[7] == (0x00)))
    {
        return true;
    }

    //doc, ppt, xls
    if ((sig[0] == (0xD0))&&(sig[1] == (0xCF))&&(sig[2] == (0x11))&&(sig[3] == (0xE0))&&(sig[4] == (0xA1))&&(sig[5] == (0xB1))&&(sig[6] == (0x1A))&&(sig[7] == (0xE1)))
    {
        return true;
    }

    //pdf
    if ((sig[0] == (0x25))&&(sig[1] == (0x50))&&(sig[2] == (0x44))&&(sig[3] == (0x46)))
    {
        return true;
    }
    delete sig;
    myfile.close();
    return false;
}

I looked up on the internet and found that we can compare the signatures, ie first 8 bytes in case of MS office files and first 4 bytes in case PDFs . In the code above, I am doing the same. CheckFile() returns TRUE in case of PDFs and Office 2007 formats including .docx and .pptx but returns FALSE in case of .doc and .ppt . The console output for the a .doc file is:

FFFFFFD0
FFFFFFCF
11
FFFFFFE0
FFFFFFA1
FFFFFFB1
1A
FFFFFFE1

Where each line corresponds to hex of char in sig. Note that the last byte is the same as the signature of .doc file. I don't know why these extra FFFFFF are present here. What could be the problem ??

As for the problem with FFFFFFFF , you might notice that the last byte of those numbers are larger than 0x7f which means that they, for a signed byte is negative. So you are using a signed char and the compiler sign-extends it when you print the values.

You should change to unsigned char (or even better, the standard type uint8_t ).

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM