简体   繁体   中英

Check file type in Linux

I want to check certain files and see if their types and extensions are matching. What I'm currently doing is using the file command to check the mime type (or basic output from file ) and comparing it with the file extension. However, some file types returns the same mime-type, .sfx and .dll for example.

Also i have some files with no extension at all, and i should be able to determine file type of them correctly.

I want to be able to get all file types correctly but the most important file types that im currently interested in are;

  • dll
  • msi
  • com
  • cpl
  • exe
  • ocx
  • tmp
  • upd

Is there any other tool that checks and returns a file's type?

EDIT

I wrote a nodejs script that can be used as a linux command. I have created my own file signature database by merging public databases, which has the following format for each file extension;

"ISO" : [
    {
        "signature": "4344303031", // byte sequence
        "size": 5, // size of byte sequence
        "offset": 32769 // offset in the file for the signature bytes
    },
    {
        "signature": "4344303031",
        "size": 5,
        "offset": 34817
    },
    {
        "signature": "4344303031",
        "size": 5,
        "offset": 36865
    }
]

Now; i first check signature bytes for the extension available in the file's name (text.iso will result .iso), and i go and check the signature bytes of that file to see if that is really an iso file.If it is indeed iso, i return iso as result.

If it's not iso, i check all the signature byte sequences for every extension i have in my db against the given file to see if any of them matches. If i have a match, i return the result.

If i cannot find a match, i execute the file command, get the file's mime-type, and use another db i created for matching mime-types with extensions, to see if that has a match. The format for the mime-type db is like this;

"application/atom+xml": [
    "atom",
    "xml"
],
"application/atomcat+xml": [
    "atomcat"
],
"application/atomsvc+xml": [
    "atomsvc"
]

This solution currently meets my project's needs. Maybe this might help someone else aswell.

Using Python after pip install filemagic :

>>> import magic
>>> with magic.Magic() as m: m.id_filename('tmp.py')
... 
'Python script, ASCII text executable'
>>> with magic.Magic() as m: m.id_filename('test.html')
... 
'HTML document, ASCII text'

Linux has a built-in file command: man file

The main difference between Windows and *nix is that DOS/Windows has built-in dependencies on file suffix. For example, an executable must be named ".exe" (or .com); a .bat file must be named ".bat" (or .cmd).

Linux, MacOS, BSD, etc have no such restriction. Instead, they must have "execute" permission set in order to be "runable". This is true for either a binary executable (eg compiled code) or a script (eg Python, Perl ... or a shell script).

Instead of relying only on file suffix, the "file" command also looks at self-identifying "magic numbers" or other "header information" in the file itself.

SUGGESTION:

If the built-in "file" doesn't meet your needs; perhaps you can wrap it in a shell script that:

1) Checks for certain "well known suffixes" (use basename to extract the suffix), and/or

2) Calls "file" as a fallback

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM