简体   繁体   中英

How to find binary files in a directory?

I need to find the binary files in a directory. I want to do this with file, and after that I will check the results with grep. But my problem is that I have no idea what is a binary file. What will give the file command for binary files or what should I check with grep?

This finds all non-text based, binary, and empty files.

Edit

Solution with only grep (from Mehrdad's comment):

grep -rIL .

Original answer

This does not require any other tool except find and grep :

find . -type f -exec grep -IL . "{}" \;

-I tells grep to assume binary files as unmatched

-L prints only unmatched files

. matches anything else


Edit 2

This finds all non-empty binary files:

find . -type f ! -size 0 -exec grep -IL . "{}" \;

Just have to mention Perl 's -T test for text files, and its opposite -B for binary files.

$ find . -type f | perl -lne 'print if -B'

will print out any binary files it sees. Use -T if you want the opposite: text files.

It's not totally foolproof as it only looks in the first 1,000 characters or so, but it's better than some of the ad-hoc methods suggested here. See man perlfunc for the whole rundown. Here is a summary:

The "-T" and "-B" switches work as follows. The first block or so of the file is examined to see if it is valid UTF-8 that includes non-ASCII characters. If, so it's a "-T" file. Otherwise, that same portion of the file is examined for odd characters such as strange control codes or characters with the high bit set. If more than a third of the characters are strange, it's a "-B" file; otherwise it's a "-T" file. Also, any file containing a zero byte in the examined portion is considered a binary file.

My first answer to the question fell pretty much inline here using the find command. I think your instructor was looking to get you into the concept of magic numbers using the file command, which breaks them down into multiple types.

For my purposes, it was as simple as:

file * | grep executable

But it can be done in numerous ways.

As this is an assignment, you would probably hate me if I gave you the complete solution ;-) So here is a little hint:

The grep command will output a list of binary files per default, if you search for a regular expression like . that will match on any non-empty file:

grep . *

Output:

[...]
Binary file c matches
Binary file e matches

You can use awk to get the filenames only and ls to print the permissions. See the respective man pages ( man grep , man awk , man ls ).

In these modern times ( 2020 is practically the 3rd decade of the 21st century after all), I think the correct question is how do I find all the non-utf-8 files ? Utf-8 being the modern equivalent of a text file.

utf-8 encoding of text with non-ascii code points will introduce non-ascii bytes (ie, bytes with the most significant bit set). Now, not all sequences of such bytes form valid utf-8 sequences.

isutf8 from the moreutils package is what you need.

$ isutf8 -l /bin/*
/bin/[
/bin/acyclic
/bin/addr2line
/bin/animate
/bin/applydeltarpm
/bin/apropos
⋮

A quick check:

$ file $(isutf8 -l /bin/*)
/bin/[:             ELF 64-bit LSB pie executable, x86-64, version 1 (SYSV), dynamically linked, interpreter /lib64/ld-linux-x86-64.so.2, BuildID[sha1]=4d70c2142fc672d8a69d033ecb6693ec15b1e6fb, for GNU/Linux 3.2.0, stripped
/bin/acyclic:       ELF 64-bit LSB pie executable, x86-64, version 1 (SYSV), dynamically linked, interpreter /lib64/ld-linux-x86-64.so.2, BuildID[sha1]=d428ea52eb0e8aaf7faf30914710d8fbabe6ca28, for GNU/Linux 3.2.0, stripped
/bin/addr2line:     ELF 64-bit LSB pie executable, x86-64, version 1 (SYSV), dynamically linked, interpreter /lib64/ld-linux-x86-64.so.2, BuildID[sha1]=797f42bc4f8fb754a49b816b82d6b40804626567, for GNU/Linux 3.2.0, stripped
/bin/animate:       ELF 64-bit LSB pie executable, x86-64, version 1 (SYSV), dynamically linked, interpreter /lib64/ld-linux-x86-64.so.2, BuildID[sha1]=36ab46e69c1bfea433382ffc9bbd9708365dac2b, for GNU/Linux 3.2.0, stripped
/bin/applydeltarpm: ELF 64-bit LSB pie executable, x86-64, version 1 (SYSV), dynamically linked, interpreter /lib64/ld-linux-x86-64.so.2, BuildID[sha1]=a1fddcbeec9266e698782596f2dfd1b4f3e0b974, for GNU/Linux 3.2.0, stripped
/bin/apropos:       symbolic link to whatis
⋮

You may wish to invert the test and get all the text files. Use -i :

$ isutf8 -il /bin/*
/bin/alias
/bin/bashbug
/bin/bashbug-64
/bin/bg
⋮
$ file -L $(isutf8 -il /bin/*)
/bin/alias:      a /usr/bin/sh script, ASCII text executable
/bin/bashbug:    a /usr/bin/sh - script, ASCII text executable, with very long lines
/bin/bashbug-64: a /usr/bin/sh - script, ASCII text executable, with very long lines
/bin/bg:         a /usr/bin/sh script, ASCII text executable
⋮

Yeah, it reads the whole file, but it's pretty speedy, and if you want accuracy…

I need to find the binary files in a directory. I want to do this with file, and after that I will check the results with grep. But my problem is that I have no idea what is a binary file. What will give the file command for binary files or what should I check with grep?

I think the best tool to determine the nature of a file is the file utility. In one of my directories I have only one file identified as binary by the nautilus file manager. For this file only, the command ls | xargs file returns "data" without any further information.

Binary files in linux have the format of ELF

When you run file command on a binary file, then the output contains the word ELF . You can grep this.

On command line:

file <binary_file_name>

So, if you want to find the binary files inside a directory (in linux for example), you can do something like this:

ls | xargs file | grep ELF

You can use find and the parameter -executable that is basically what you want.

The manpages says:

   -executable
          Matches files which are executable and directories which are searchable (in a file name resolution sense).  This takes into  account  access control lists and other permissions artefacts which the -perm test ignores.  This test makes use of the access(2) system call, and so can be fooled by NFS servers which do UID mapping (or root-squashing), since many systems implement access(2) in the client's kernel and so  cannot make  use  of  the  UID mapping information held on the server.  Because this test is based only on the result of the access(2) system call, there is no guarantee that a file for which this test succeeds can actually be executed.

This is a result of what you want:

# find /bin  -executable -type f | grep 'dmesg'
/bin/dmesg

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM