简体   繁体   中英

How to make grep to stop searching in each file after N lines?

It's best to describe the use by a hypothetical example:

Searching for some useful header info in a big collection of email storage (each email in a separate file). eg doing stats of top mail client apps used.

Normally if you do grep you can specify -m to stop at first match but let's say an email does not contact X-Mailer or whatever it is we are looking for in a header? It will scan through the whole email. Since most headers are <50 lines performance could be increased by telling grep to search only 50 lines on any file. I could not find a way to do that.

I don't know if it would be faster but you could do this with awk:

 awk '/match me/{print;exit}FNR>50{exit}' *.mail

will print the first line matching match me if it appears in the first 50 lines. (If you wanted to print the filename as well, grep style, change print; to print FILENAME ":" $0; )

awk doesn't have any equivalent to grep 's -r flag, but if you need to recursively scan directories, you can use find with -exec :

find /base/dir -iname '*.mail' \
     -exec awk '/match me/{print FILENAME ":" $0;exit}FNR>50{exit}' {} +

You could solve this problem by piping head -n50 through grep but that would undoubtedly be slower since you'd have to start two new processes (one head and one grep ) for each file. You could do it with just one head and one grep but then you'd lose the ability to stop matching a file as soon as you find the magic line, and it would be awkward to label the lines with the filename.

你可以做这样的事情

head -50 <mailfile>| grep <your keyword>

Try this command:

for i in *
do
    head -n 50 $i | grep -H --label=$i pattern
done

output:

1.txt: aaaaaaaa pattern aaaaaaaa
2.txt: bbbb pattern bbbbb
ls *.txt | xargs head -<N lines>| grep 'your_string'

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM