I need to check that whether a file is .doc
, .ppt
, .pdf
or any other. I have written the following code:
bool CheckFile(string path)
{
char * sig;
sig = new char[8];
ifstream myfile;
myfile.open(path.c_str(), ios::in | ios::binary);
if (myfile.fail())
{
MessageBox(0,"File Not Opened","ERROR",MB_OK);
break;
}
myfile.read(sig,8);
//docx, pptx, xlsx
if ((sig[0] == (0x50))&&(sig[1] == (0x4B))&&(sig[2] == (0x03))&&(sig[3] == (0x04))&&(sig[4] == (0x14))&&(sig[5] == (0x00))&&(sig[6] == (0x06))&&(sig[7] == (0x00)))
{
return true;
}
//doc, ppt, xls
if ((sig[0] == (0xD0))&&(sig[1] == (0xCF))&&(sig[2] == (0x11))&&(sig[3] == (0xE0))&&(sig[4] == (0xA1))&&(sig[5] == (0xB1))&&(sig[6] == (0x1A))&&(sig[7] == (0xE1)))
{
return true;
}
//pdf
if ((sig[0] == (0x25))&&(sig[1] == (0x50))&&(sig[2] == (0x44))&&(sig[3] == (0x46)))
{
return true;
}
delete sig;
myfile.close();
return false;
}
I looked up on the internet and found that we can compare the signatures, ie first 8 bytes in case of MS office files and first 4 bytes in case PDFs
. In the code above, I am doing the same. CheckFile()
returns TRUE
in case of PDFs
and Office 2007 formats including .docx
and .pptx
but returns FALSE
in case of .doc
and .ppt
. The console output for the a .doc
file is:
FFFFFFD0
FFFFFFCF
11
FFFFFFE0
FFFFFFA1
FFFFFFB1
1A
FFFFFFE1
Where each line corresponds to hex of char in sig. Note that the last byte is the same as the signature of .doc
file. I don't know why these extra FFFFFF
are present here. What could be the problem ??
As for the problem with FFFFFFFF
, you might notice that the last byte of those numbers are larger than 0x7f
which means that they, for a signed byte is negative. So you are using a signed char
and the compiler sign-extends it when you print the values.
You should change to unsigned char
(or even better, the standard type uint8_t
).
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.