简体   繁体   中英

How to grep multiples strings within N lines

I was wondering if there is anyway that I could grep (or any other command) that will search multiple strings within N lines.

Example

Search for "orange", "lime", "banana" all within 3 lines

If the input file is

xxx
a lime
b orange
c banana
yyy
d lime
foo
e orange
f banana

I want to print the three lines starting with a, b, c. The lines with the searched strings can appear in any order.

I do not want to print the lines d, e, f, as there is a line in between, and so the three strings are not grouped together.

Your question is rather unclear. Here is a simple Awk script which collects consecutive matching lines and prints iff the array is longer than three elements.

awk '/orange|lime|banana/ { a[++n] = $0; next }
    { if (n>=3) for (i=1; i<=n; i++) print a[i]; delete a; n=0 }
    END { if (n>=3) for (i=1; i<=n; i++) print a[i] }' file

It's not clear whether you require all of your expressions to match; this one doesn't attempt to. If you see three successive lines with orange , that's a match, and will be printed.

The logic should be straightforward. The array a collects matches, with n indexing into it. When we see a non-match, we check its length, and print if it's 3 or more, then start over with an empty array and index. This is (clumsily) repeated at end of file as well, in case the file ends with a match.

If you want to permit gap (so, if there are three successive lines where one matches "orange" and "banana", then one which doesn't match, then one which matches "lime", print those three lines? Your question is unclear) you could change to always keeping an array of the last three lines, though then you also need to specify how to deal with eg a sequence of five lines which matches by these rules.

Similar to tripleee's answer, I would also use awk for this purpose. The main idea is to implement a simple state machine.

Simple example

As a simple example, first try to find three consecutive lines of banana. Consider the pattern-action statement

/banana/ { bananas++ }

For every line matching the regex banana , it increases the variable bananas (in awk, all variables are initialised with 0).

Of course, you want bananas to be reset to 0 when there is non-matching line, so your search starts from the beginning:

/banana/ { bananas++; next }
{ bananas = 0 }

You can also test for values of variables in the pattern of actions. For example, if you want to print "Found" after three lines containing banana , extend the rule:

/banana/ {
    bananas++
    if (bananas >= 3) {
        print "Found"
        bananas = 0
    }
    next
}

This resets the variable bananas to 0, and prints the string "Found".

How to proceed further

Using this basic idea, you should be able to write your own awk script that handles all the cases. First, you should familiarise yourself with awk (pattern, actions, program execution).

Then, extend and adapt my example to fit your needs.

  • In particular, you probably need an associative array matched , with indices "banana", "orange", "lime".
  • You set matched["banana"] = $0 when the current line matches /banana/ . This saves the current line for later output.
  • You clear that whole array when the current line does not match any of your expressions.
  • When all strings are found ( matched[s] is not empty for every string s ), you can print the contents of matched[s] .

I leave the actual implementation to you. As others have said, your description leaves many corner-cases unclear. You should figure them out for yourself and adapt your implementation accordingly.

I think you want this:

awk '
  /banana/ {banana=3}
  /lime/   {lime=3}
  /orange/ {orange=3}
 (orange>0)&&(lime>0)&&(banana>0){print l2,l1,$0}
 {orange--;lime--;banana--;l2=l1;l1=$0}' OFS='\n' yourFile

So, if you see the word banana you set banana=3 so it is valid for the next 3 lines. Likewise, if you see lime , give it 3 lines of chances to make a group, and similarly for orange .

Now, if all of orange , lime and banana have been seen in the previous three lines, print the second to last line ( l2 ), the last line ( l1 ) and the current line $0 .

Now decrement the counts for each fruit before we move to the next line, and save the current line and shuffle backwards in time order the previous 2 lines.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM