简体   繁体   中英

How to capture regex match & line above regex match and send it to a file?

I have a file which contains server ip addresses and error reported on these servers.

I need to capture those server ip which has reported error along with error message.

Tired using below code but it captures only regex match and not the line above the regex.

a=open("log1.txt", 'r')
for line in a:
    if re.match('(\d+)' , line):
        print(line, file=open('output.txt', 'a'))

a=open("log1.txt", 'r')
for line in a:
    if re.match('(\d+)' , line):
        print(line, file=open('output.txt', 'a'))

Input:-

---------------------------------------------------------------------
    Errpt report for 192.1.152.10 ## 

    0717032319 T H ent2 ETHERNET DOWN
---------------------------------------------------------------------
    Errpt report for 172.11.71.113 ##  

    0717032319 T H ent2 PROBLEM RESOLVED
    0717032319 T H ent2 PROBLEM RESOLVED
---------------------------------------------------------------------
    Errpt report for 172.1.79.114 ## 

    0717032319 T H ent3 PROBLEM RESOLVED
    0717032319 T H ent2 PROBLEM RESOLVED
    0717032319 T H ent5 PROBLEM RESOLVED
    0717032319 T H ent6 PROBLEM RESOLVED
---------------------------------------------------------------------
    Errpt report for 192.1.119.169 ## 

---------------------------------------------------------------------
    Errpt report for 192.11.119.129 ## 

---------------------------------------------------------------------

Expected Output:-

---------------------------------------------------------------------
Errpt report for 192.1.152.10 ## 

0717032319 T H ent2 ETHERNET DOWN
---------------------------------------------------------------------
Errpt report for 172.11.71.113 ##  

0717032319 T H ent2 PROBLEM RESOLVED
0717032319 T H ent2 PROBLEM RESOLVED
---------------------------------------------------------------------
Errpt report for 172.1.79.114 ## 

0717032319 T H ent3 PROBLEM RESOLVED
0717032319 T H ent2 PROBLEM RESOLVED
0717032319 T H ent5 PROBLEM RESOLVED
0717032319 T H ent6 PROBLEM RESOLVED

Use itertools.tee to make two iterators over your input file - use that to cache the previous line (for output).

with open("log1.txt") as infile, open("output.txt", 'w') as outfile:
    cache, infile = itertools.tee(infile)
    next(infile, None)
    for err, line in zip(cache, infile):
        if re.match('(\d+)', line):
            print(line, file=outfile)

My guess is that this expression is rather probable to return the desired output:

Errpt report[\s\S]*?(?:\s*\d{10}\s+[A-Z].*)+

Test with re.findall

import re

regex = r"Errpt report[\s\S]*?(?:\s*\d{10}\s+[A-Z].*)+"

test_str = """
---------------------------------------------------------------------
    Errpt report for 192.1.152.10 ## 

    0717032319 T H ent2 ETHERNET DOWN
---------------------------------------------------------------------
    Errpt report for 172.11.71.113 ##  

    0717032319 T H ent2 PROBLEM RESOLVED
    0717032319 T H ent2 PROBLEM RESOLVED
---------------------------------------------------------------------
    Errpt report for 172.1.79.114 ## 

    0717032319 T H ent3 PROBLEM RESOLVED
    0717032319 T H ent2 PROBLEM RESOLVED
    0717032319 T H ent5 PROBLEM RESOLVED
    0717032319 T H ent6 PROBLEM RESOLVED
---------------------------------------------------------------------
    Errpt report for 192.1.119.169 ## 

---------------------------------------------------------------------
    Errpt report for 192.11.119.129 ## 

---------------------------------------------------------------------

"""

print(re.findall(regex, test_str, re.M))

Output

['Errpt report for 192.1.152.10 ## \n\n    0717032319 T H ent2 ETHERNET DOWN', 'Errpt report for 172.11.71.113 ##  \n\n    0717032319 T H ent2 PROBLEM RESOLVED\n    0717032319 T H ent2 PROBLEM RESOLVED', 'Errpt report for 172.1.79.114 ## \n\n    0717032319 T H ent3 PROBLEM RESOLVED\n    0717032319 T H ent2 PROBLEM RESOLVED\n    0717032319 T H ent5 PROBLEM RESOLVED\n    0717032319 T H ent6 PROBLEM RESOLVED']

Demo

The expression is explained on the top right panel of regex101.com , if you wish to explore/simplify/modify it, and in this link , you can watch how it would match against some sample inputs, if you like.

RegEx Circuit

jex.im visualizes regular expressions:

在此处输入图片说明

You could match the whole line containing the hyphens and the first line of the log file and use a repeating pattern to match the following lines that start with 10 digits.

Instead of using re.search which will look for the first location where the regular expression pattern produces a match, you might re.findall and write all the matches back to your output.txt file.

^-+\r?\nErrpt report for \d{1,3}(?:\.\d{1,3}){3} ##[\t ]*(?:\r?\n\s*\d{10}[ \t].*)+

Explanation

  • ^ Start of string
  • -+\\r?\\n Match 1+ times - followed by a newline
  • Errpt report for Match literally
  • \\d{1,3}(?:\\.\\d{1,3}){3} ## Match ip like pattern and space ##
  • [\\t ]* Match 0+ times a space or tab
  • (?: Non capturing group
    • \\r?\\n\\s*\\d{10} Match a newline, 0+ whitespace chars and 10 digits
    • [ \\t].* Match space or tab and 0+ times any char except newline.
  • )+ Close non capturing group and repeat 1+ times

Regex demo

For example:

import re

regex = r"^-+\r?\nErrpt report for \d{1,3}(?:\.\d{1,3}){3} ##[\t ]*(?:\r?\n\s*\d{10}[ \t].*)+"

with open ("log1.txt", "r") as log1, open("output.txt", "w") as filteredLog:
    output = re.findall(regex, log1.read(), re.M)
    filteredLog.write("\n".join(output))

Result

---------------------------------------------------------------------
Errpt report for 192.1.152.10 ## 

0717032319 T H ent2 ETHERNET DOWN
---------------------------------------------------------------------
Errpt report for 172.11.71.113 ##  

0717032319 T H ent2 PROBLEM RESOLVED
0717032319 T H ent2 PROBLEM RESOLVED
---------------------------------------------------------------------
Errpt report for 172.1.79.114 ## 

0717032319 T H ent3 PROBLEM RESOLVED
0717032319 T H ent2 PROBLEM RESOLVED
0717032319 T H ent5 PROBLEM RESOLVED
0717032319 T H ent6 PROBLEM RESOLVED

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM