简体   繁体   中英

How to distinguish xlsx and docx files from zip archives?

I have a use case where I need to know the file type of a file to identify and blacklist the executables(exe,installers etc), archive files(zip, rar etc.). Therefore relying on the extension is not enough for me as the extension of a file can be changed but the file property will remain the same. I tried using the linux command:

file --b filename

The above solution is working perfectly with all the file types except the .xlsx and .docx file because the command is giving the following for the .xlsx and .docx

Zip archive data, at least v2.0 to extract

And because of this I end up blacklisting the .xlsx and .docx file as well.

Can anybody suggest me a way to get the file type without using the extension that works for the xlsx and docx as well.

You have to update your file command (or its magic file).

Recent versions do recognize MSOOXML files:

$ file -b test.docx
Microsoft Word 2007+

$ file --version
file-5.32

I used Mimemagic Gem and added custom magic(as it is called by the Gem) to identify xlsx, docx, and pptx file format. Also this does not relies on the file extension.

Following are the list of magic that I added:

[['application/vnd.openxmlformats-officedocument.wordprocessingml.document.custom', [[0, "PK\x03\x04", [[30, '_rels/.rels', [[0..5000, 'word/']]]]]]],
['application/vnd.openxmlformats-officedocument.spreadsheetml.sheet.custom', [[0, "PK\003\004", [[30, '_rels/.rels', [[0..5000, 'xl/']]]]]]],
['application/vnd.openxmlformats-officedocument.presentationml.presentation.custom', [[0, "PK\003\004", [[30, '_rels/.rels', [[0..5000, 'ppt/']]]]]]],['application/vnd.openxmlformats-officedocument.wordprocessingml.document.custom', [[0, "PK\x03\x04", [[30, 'word/']]]]],
['application/vnd.openxmlformats-officedocument.spreadsheetml.sheet.custom', [[0, "PK\003\004", [[30, 'xl/']]]]],
['application/vnd.openxmlformats-officedocument.presentationml.presentation.custom', [[0, "PK\003\004", [[30, 'ppt/']]]]]].each do |magic|
  MimeMagic.add(magic[0], magic: magic[1])
end

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM