简体   繁体   中英

How to grep for a line above a previously matched line

I have log files in which the date is only appended periodically. My log file looks something like this:

Monday 2017
foo foo foo foo foo foo foo foo foo foo foo foo 
foo foo foo ALARM foo foo foo foo foo foo foo foo 
foo foo foo foo foo foo foo foo foo foo foo foo 
foo foo foo foo foo foo ALARM foo foo foo foo foo
foo foo foo foo foo foo foo foo foo foo foo foo

I am making a script that goes something like this:

grep 'ALARM' myfile.log | tail -1

I need to search for the previous date entry above the last alarm and include that in my results. I have no idea how many lines above the matched alarm line it will occur.

Desired output:

Monday 2017
foo foo foo foo foo foo ALARM foo foo foo foo foo

assuming date pattern is Monday 2017

grep -E 'Monday 2017|ALARM' | grep -B1 'ALARM'

second grep is to remove multiple date pattern between ALARM matches,

EDIT: reading again question it seems only the last line matching ALARM is wanted, I would do with following perl one liner:

perl -ne 'if(/Monday 2017/){$last_date=$_}if(/ALARM/){$date=$last_date;$line=$_}END{print $date,$line}' <<END
Monday 2017
foo foo foo foo foo foo foo foo foo foo foo foo 
foo foo foo ALARM foo foo foo foo foo foo foo foo 
foo foo foo foo foo foo foo foo foo foo foo foo 
foo foo foo foo foo foo ALARM foo foo foo foo foo
foo foo foo foo foo foo foo foo foo foo foo foo
END

You can use tac to reverse a stream line-by-line (see seq 10 | tac to see what it does). This is not cheap, be warned, but if your stuff is small enough, this can provide an simple solution:

grep -B 9999999 lastSearchTerm my.log | tac | grep -B 9999999 firstSearchTerm | tac

This will print the block from the firstSearchTerm to the lastSearchTerm.

grep -B 9999999 lastSearchTerm my.log | tac | tail -n +2 | grep -m 1 lastBeforeTerm

This will print only the last line containing lastBeforeTerm before the lastSearchTerm.

For your specific case, this should do it:

grep -B 9999999 ALARM my.log | tac | {
  IFS= read -e line
  grep -m 1 '2017'
  echo "$line"
}

(Adjust the 2017 part to match any line which looks like a time stamp.)

Of course, this is not the fastest solution but it is simple and will work for small input.

Awk + tac solution:

Sample myfile.log contents:

some text text text
Sunday 2017
foo foo foo foo foo foo foo foo foo foo foo foo 
foo foo foo ALARM foo foo foo foo foo foo foo foo 
foo foo foo foo foo foo foo foo foo foo foo foo 
bar foo foo foo foo foo ALARM foo foo foo foo foo
bar foo foo foo foo foo foo foo foo foo foo foo
Monday 2017
foo foo foo foo foo foo foo foo foo foo foo foo 
foo foo foo ALARM foo foo foo foo foo foo foo foo 
foo foo foo foo foo foo foo foo foo foo foo foo 
foo foo foo foo foo foo ALARM foo foo foo foo foo
text foo foo foo foo foo foo foo foo foo foo foo

The job:

awk '/ALARM/{ f=1 }f && /^[A-Z][a-z]+ 2[0-9]{3}/{ print; exit }' <(tac myfile.log)
  • tac myfile.log - print the file lines in reverse
  • /ALARM/{ f=1 } - on encountering ALARM line - set the starting phase of the processing using the flag f
  • /^[AZ][az]+ 2[0-9]{3}/ - pattern indicating "date" line
  • print; exit print; exit - print the current line(as a resulting line) and terminate script execution immediately

The output:

Monday 2017

This is assuming that a "date" is characterized by a line containing day and four digits:

tac myfile.log \
    | sed -En '/ALARM/,/day [[:digit:]]{4}/{/day [[:digit:]]{4}/{p;q}}'

Like the other solutions, this uses tac to print the lines in reverse; the sed command then does this:

-n suppresses output by default.

/ALARM/,/day [[:digit:]]{4}/ { # In the range from ALARM to the date
    /day [[:digit:]]{4}/{      # On the line of the date
        p                      # Print just that line
        q                      # Exit
    }
}

The q is there to avoid reading the rest of the file after we've already found what we wanted.

Notice that some seds might need extra semicolons, as in {p;q;} .

awk solution,

awk 'NF==2 {d=$0}; /ALARM/ { printf("%s\n%s\n", d, $0)}' sample.txt

output:

Monday 2017
foo foo foo ALARM foo foo foo foo foo foo foo foo 
Monday 2017
foo foo foo foo foo foo ALARM foo foo foo foo foo

We can't do this efficiently with Grep. Here's a simple Sed construct to remember:

sed -n '/before/ {h;n;}; /after/ {x;p;x;p;}' < input.txt

This stores the most recent line matching the pattern before and then prints it out whenever it encounters a subsequent line matching the pattern after . Then, it prints out the line matching after as well. To break it down:

  • The -n flag suppresses output of every line—we'll tell Sed to output what we want manually.
  • /before/ - When we find a line matched by the pattern before ...
    • h - Save it to the hold space buffer for later.
    • n - Proceed to the next line.
  • /after/ - When we find a line matched by the pattern after ...
    • x;p - Exchange the line with the contents of the hold buffer ( before ) and print it.
    • x;p - Swap after back out of the hold buffer and print it.

This runs very quickly because we can filter the input in one pass without the need to pipe the output or reverse the file first.


Now, let's apply it to the example in the question:

sed -n '/^date pattern$/ {h;n;}; /ALARM/ {x;p;x;p;}' < input.txt

This just plugs the specific patterns into the Sed program that I described above—it outputs the most recently seen date and the matched line every time it sees ALARM . Because the question only wants to show the last line containing ALARM after each date, we need to modify the program slightly:

sed -n '
    /^date pattern$/ {
        :alarm
        x
        /ALARM/ {s/^\(date pattern\)\n.*\n\(.*ALARM.*\)$/\1\n\2/;p;n;}
    }
    /ALARM/ H
    $ b alarm
' < input.txt

Instead of holding just the date line, this buffers the date and each of the lines containing ALARM until Sed encounters the next date, after which it will print the the date and the last ALARM line in the hold buffer. We check for the presence of ALARM so we don't print a date when no alarms occurred. :alarm declares a branch label that we can return to using b alarm as we do for the last line of the file (denoted by $ ) to handle anything leftover in the hold space buffer.

I used [AZ][az]\\+day [0-9]\\{4\\} for date pattern in each of these examples, but adjust as needed.

Edit: I think I misread the question. It looks like we only want the last date and the last alarm line from the entire file. If this is true, using Tac to reverse the file first is faster, but consumes more memory:

tac input.txt | sed -n '/ALARM/ {h;:a;n;/^date pattern$/ {p;x;p;q;}; ba;}'

With this approach, we store the last alarm in the file and print it after we find and print the last date in the file. We use q to exit as soon as we find the last date to avoid processing the rest. If we don't have Tac on our system, we can use Sed to reverse a file as well:

sed '1!G;h;$!d' < input.txt | sed ...

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM