简体   繁体   中英

How do I grep out multiple lines of the same pattern?

I have a log file that is filled with exceptions that is not useful to me.

It is being generated every two second and when looking at log file that contains 24 hrs of logging it becomes overwhelming to get to the relevant info I need.

My logs look something like this:

2013-04-21 00:00:00,852 [service name] ERROR java-class - Exception 
  at java.net ......
  at java.apache ....
  and 28 more lines like these.

I want to clean up the copy of the log to another file.

Obviously grep -v "string" -A29 foo.log > new_file.log doesn't help me filter out those 30 lines.

I also tried several sed and awk statements I saw for similar issues others where having. But none of them seem to help.

I am more of network admin getting my feet wet on linux systems.

Can somebody please help?

This might work for you (GNU sed):

sed '/ERROR java-class - Exception/{:a;$!N;/\n\s*at\s.*/s///;ta;D}' file >new_file

This gathers up all the lines following ERROR java-class - Exeption that begin with spaces followed by at ... into one line and then deletes that line. Using the above as a template other exceptions could be filtered in the same manner.

I'm not sure if there's a way to do it with grep , but it might be easier to use something like Perl:

perl -ne '$m = 0 if m/string/; print if $m++ > 29' foo.log > new_file.log

(Here $m is the number of lines since the last line containing string .)

Grepping with -A29 might not work in all the scenarios as sometimes the exception trace might have less number of lines after ERROR line or may be more depending upon Exception.

Just by going with the log snippet that you have provided, the entire exception trace can be removed using egrep and regex. Say the log.txt file has the following logger statements (having good lines as well as lines from exception trace):

A good line that should be captured - 1
2013-04-21 00:00:00,852 [service name] ERROR java-class - Exception 
  at java.net ......
  at java.apache ....
A good line that should be captured - 2
2013-04-21 00:00:00,852 [service name] ERROR java-class - Exception 
  at java.net ......
  at java.apache ....
A good line that should be captured - 3
2013-04-21 00:00:00,852 [service name] ERROR java-class - Exception 
  at java.net ......
  at java.apache ....
A good line that should be captured - 4
2013-04-21 00:00:00,852 [service name] ERROR java-class - Exception 
  at java.net ......
  at java.apache ....
A good line that should be captured - 5
2013-04-21 00:00:00,852 [service name] ERROR java-class - Exception 
  at java.net ......
  at java.apache ....

To retrieve just the lines that are not part of the exception trace use the following egrep:

egrep -vi "(error|(^\s+AT.*)|(^\s+?caused.*))" log.txt > /path/to/any/file

i : is for ignoring case in your regex. To demonstrate that purposefully kept the "error" in lowercase and "AT" in uppercase.

(^\\s+AT. ) : Looks for any line starting with spaces followed by "at" followed by any characters.

(^\\s+?caused. ) : This regex group is added since sometimes there are nested stack traces from Java which typically has the first line starting with "Caused By" and then followed by some lines of stack trace starting with " at ..." . Although, its optional to include this.

Output of this egrep

A good line that should be captured - 1
A good line that should be captured - 2
A good line that should be captured - 3
A good line that should be captured - 4
A good line that should be captured - 5

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM