简体   繁体   中英

grep based on blacklist — without procedural code?

It's a well-known task, simple to describe:

Given a text file foo.txt, and a blacklist file of exclusion strings, one per line, produce foo_filtered.txt that has only the lines of foo.txt that do not contain any exclusion string.

A common application is filtering compiler warnings from a build log, but to ignore warnings on files that are not yours. The file foo.txt is the warnings file (itself filtered from the build log), and a blacklist file excluded_filenames.txt with file names, one per line.

I know how it's done in procedural languages like Perl or AWK, and I've even done it with combinations of Linux commands such as cut, comm, and sort.

But I feel that I should be really close with xargs, and just can't see the last step.

I know that if excluded_filenames.txt has only 1 file name in it, then

grep -v foo.txt `cat excluded_filenames.txt`

will do it.

And I know that I can get the filenames one per line with

xargs -L1 -a excluded_filenames.txt

So how do I combine those two into a single solution, without explicit loops in a procedural language?

Looking for the simple and elegant solution.

You should use the -f option (or you can use fgrep which is the same):

grep -vf excluded_filenames.txt foo.txt

You could also use -F which is more directly the answer to what you asked:

grep -vF "`cat excluded_filenames.txt`" foo.txt

from man grep

-f FILE, --file=FILE
          Obtain patterns from FILE, one per line.  The empty file contains zero patterns, and therefore matches nothing.

-F, --fixed-strings
          Interpret PATTERN as a list of fixed strings, separated by newlines, any of which is to be matched.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM