简体   繁体   English

如何捕获正则表达式匹配项和正则表达式匹配项上方的行并将其发送到文件?

[英]How to capture regex match & line above regex match and send it to a file?

I have a file which contains server ip addresses and error reported on these servers. 我有一个包含服务器IP地址和这些服务器上报告的错误的文件。

I need to capture those server ip which has reported error along with error message. 我需要捕获那些报告了错误以及错误消息的服务器IP。

Tired using below code but it captures only regex match and not the line above the regex. 使用下面的代码很累,但它仅捕获正则表达式匹配项,而不捕获正则表达式上方的行。

a=open("log1.txt", 'r')
for line in a:
    if re.match('(\d+)' , line):
        print(line, file=open('output.txt', 'a'))

a=open("log1.txt", 'r')
for line in a:
    if re.match('(\d+)' , line):
        print(line, file=open('output.txt', 'a'))

Input:- 输入: -

---------------------------------------------------------------------
    Errpt report for 192.1.152.10 ## 

    0717032319 T H ent2 ETHERNET DOWN
---------------------------------------------------------------------
    Errpt report for 172.11.71.113 ##  

    0717032319 T H ent2 PROBLEM RESOLVED
    0717032319 T H ent2 PROBLEM RESOLVED
---------------------------------------------------------------------
    Errpt report for 172.1.79.114 ## 

    0717032319 T H ent3 PROBLEM RESOLVED
    0717032319 T H ent2 PROBLEM RESOLVED
    0717032319 T H ent5 PROBLEM RESOLVED
    0717032319 T H ent6 PROBLEM RESOLVED
---------------------------------------------------------------------
    Errpt report for 192.1.119.169 ## 

---------------------------------------------------------------------
    Errpt report for 192.11.119.129 ## 

---------------------------------------------------------------------

Expected Output:- 预期产量:

---------------------------------------------------------------------
Errpt report for 192.1.152.10 ## 

0717032319 T H ent2 ETHERNET DOWN
---------------------------------------------------------------------
Errpt report for 172.11.71.113 ##  

0717032319 T H ent2 PROBLEM RESOLVED
0717032319 T H ent2 PROBLEM RESOLVED
---------------------------------------------------------------------
Errpt report for 172.1.79.114 ## 

0717032319 T H ent3 PROBLEM RESOLVED
0717032319 T H ent2 PROBLEM RESOLVED
0717032319 T H ent5 PROBLEM RESOLVED
0717032319 T H ent6 PROBLEM RESOLVED

Use itertools.tee to make two iterators over your input file - use that to cache the previous line (for output). 使用itertools.tee在您的输入文件上创建两个迭代器-使用它来缓存前一行(用于输出)。

with open("log1.txt") as infile, open("output.txt", 'w') as outfile:
    cache, infile = itertools.tee(infile)
    next(infile, None)
    for err, line in zip(cache, infile):
        if re.match('(\d+)', line):
            print(line, file=outfile)

My guess is that this expression is rather probable to return the desired output: 我的猜测是,该表达式很可能返回所需的输出:

Errpt report[\s\S]*?(?:\s*\d{10}\s+[A-Z].*)+

Test with re.findall re.findall测试

import re

regex = r"Errpt report[\s\S]*?(?:\s*\d{10}\s+[A-Z].*)+"

test_str = """
---------------------------------------------------------------------
    Errpt report for 192.1.152.10 ## 

    0717032319 T H ent2 ETHERNET DOWN
---------------------------------------------------------------------
    Errpt report for 172.11.71.113 ##  

    0717032319 T H ent2 PROBLEM RESOLVED
    0717032319 T H ent2 PROBLEM RESOLVED
---------------------------------------------------------------------
    Errpt report for 172.1.79.114 ## 

    0717032319 T H ent3 PROBLEM RESOLVED
    0717032319 T H ent2 PROBLEM RESOLVED
    0717032319 T H ent5 PROBLEM RESOLVED
    0717032319 T H ent6 PROBLEM RESOLVED
---------------------------------------------------------------------
    Errpt report for 192.1.119.169 ## 

---------------------------------------------------------------------
    Errpt report for 192.11.119.129 ## 

---------------------------------------------------------------------

"""

print(re.findall(regex, test_str, re.M))

Output 产量

['Errpt report for 192.1.152.10 ## \n\n    0717032319 T H ent2 ETHERNET DOWN', 'Errpt report for 172.11.71.113 ##  \n\n    0717032319 T H ent2 PROBLEM RESOLVED\n    0717032319 T H ent2 PROBLEM RESOLVED', 'Errpt report for 172.1.79.114 ## \n\n    0717032319 T H ent3 PROBLEM RESOLVED\n    0717032319 T H ent2 PROBLEM RESOLVED\n    0717032319 T H ent5 PROBLEM RESOLVED\n    0717032319 T H ent6 PROBLEM RESOLVED']

Demo 演示

The expression is explained on the top right panel of regex101.com , if you wish to explore/simplify/modify it, and in this link , you can watch how it would match against some sample inputs, if you like. regex101.com的右上角对表达式进行了说明,如果您希望对其进行探索/简化/修改,并且在此链接中 ,您可以根据需要观看它与某些示例输入的匹配方式。

RegEx Circuit RegEx电路

jex.im visualizes regular expressions: jex.im可视化正则表达式:

在此处输入图片说明

You could match the whole line containing the hyphens and the first line of the log file and use a repeating pattern to match the following lines that start with 10 digits. 您可以将包含连字符的整个行与日志文件的第一行进行匹配,并使用重复模式来匹配以10位开头的以下各行。

Instead of using re.search which will look for the first location where the regular expression pattern produces a match, you might re.findall and write all the matches back to your output.txt file. 可以使用re.findall并将所有匹配项写回到output.txt文件中,而不是使用re.search查找正则表达式模式产生匹配项的第一个位置。

^-+\r?\nErrpt report for \d{1,3}(?:\.\d{1,3}){3} ##[\t ]*(?:\r?\n\s*\d{10}[ \t].*)+

Explanation 说明

  • ^ Start of string ^字符串开头
  • -+\\r?\\n Match 1+ times - followed by a newline -+\\r?\\n匹配1次以上-后跟换行符
  • Errpt report for Match literally 从字面上Errpt report for匹配的Errpt report for
  • \\d{1,3}(?:\\.\\d{1,3}){3} ## Match ip like pattern and space ## \\d{1,3}(?:\\.\\d{1,3}){3} ##像模式和空格一样匹配ip ##
  • [\\t ]* Match 0+ times a space or tab [\\t ]*匹配0+次空格或制表符
  • (?: Non capturing group (?:非捕获组
    • \\r?\\n\\s*\\d{10} Match a newline, 0+ whitespace chars and 10 digits \\r?\\n\\s*\\d{10}匹配换行符,0 +空格字符和10位数字
    • [ \\t].* Match space or tab and 0+ times any char except newline. [ \\t].*匹配空格或制表符,并用0+乘除换行符以外的任何字符。
  • )+ Close non capturing group and repeat 1+ times )+关闭非捕获组并重复1次以上

Regex demo 正则表达式演示

For example: 例如:

import re

regex = r"^-+\r?\nErrpt report for \d{1,3}(?:\.\d{1,3}){3} ##[\t ]*(?:\r?\n\s*\d{10}[ \t].*)+"

with open ("log1.txt", "r") as log1, open("output.txt", "w") as filteredLog:
    output = re.findall(regex, log1.read(), re.M)
    filteredLog.write("\n".join(output))

Result 结果

---------------------------------------------------------------------
Errpt report for 192.1.152.10 ## 

0717032319 T H ent2 ETHERNET DOWN
---------------------------------------------------------------------
Errpt report for 172.11.71.113 ##  

0717032319 T H ent2 PROBLEM RESOLVED
0717032319 T H ent2 PROBLEM RESOLVED
---------------------------------------------------------------------
Errpt report for 172.1.79.114 ## 

0717032319 T H ent3 PROBLEM RESOLVED
0717032319 T H ent2 PROBLEM RESOLVED
0717032319 T H ent5 PROBLEM RESOLVED
0717032319 T H ent6 PROBLEM RESOLVED

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM