简体   繁体   English

正则表达式严格匹配结尾不同的两行

[英]Regex strictly match two lines with different endings

I'm trying to match the following text of a logfile: 我正在尝试匹配日志文件的以下文本:

2019-05-22 03:40:01 INFO  ReporteClaro:194 - Termino de procesar archivo

2019-05-22 03:40:01 INFO  ReporteClaro:208 - Termino de procesar Transaction Report

It contains the same words except for those at the end ( archivo ) and ( Payment Report ). 除末尾( archivo )和( Payment Report )外,它包含相同的词。

I've tried this: 我已经试过了:

[\d]+-[\d]+-[\d]+ [\d]+:[\d]+:[\d]+ INFO  ReporteClaro:[\d]+ - Termino de procesar (archivo|Transaction Report)

But that is an optional match due to the | 但这是可选匹配项,因为| operator. 操作员。 It means it will match the first or the second line but I strictly need regex to match both of them. 这意味着它将匹配第一行或第二行,但是我严格需要使用正则表达式来匹配它们两者。 I thought something like this but obviously will not run: 我以为是这样,但显然不会运行:

[\d]+-[\d]+-[\d]+ [\d]+:[\d]+:[\d]+ INFO  ReporteClaro:[\d]+ - Termino de procesar (archivo&Transaction Report)

PD: I've tried another solution using \\n but is there any way to achieve this same result without repeating?: PD:我尝试了使用\\ n的另一种解决方案,但是有什么方法可以重复获得相同的结果吗?

[\d]+-[\d]+-[\d]+ [\d]+:[\d]+:[\d]+ INFO  ReporteClaro:[\d]+ - Termino de procesar archivo\n

[\d]+-[\d]+-[\d]+ [\d]+:[\d]+:[\d]+ INFO  ReporteClaro:[\d]+ - Termino de procesar Transaction Report

If "archivo" and "Transaction Report" are the only values that you expect after "Termino de procesar" ie there is nothing like "Termino de procesar Something Else". 如果您仅希望在“ Termino de procesar”之后使用“ archivo”和“ Transaction Report”,即没有“ Termino de procesar Something Else”之类的东西。 You could simply do the following. 您可以简单地执行以下操作。

r"^.+Termino de procesar.+$"gm

demo 演示

Effectively this will get everything from the beginning of the line till the end only if they have the phrase "Termino de procesar" in it. 有效的做法是,只有在行中带有“ Termino de procesar”一词的情况下,才能从行的开头到结尾获得所有内容。

In the case that there is other log entries that have "Termino de procesar" in them and something you don't want you can use the following. 如果还有其他日志条目中包含“ Termino de procesar”,而您不希望使用的条目可以使用以下内容。

r"^.+Termino de procesar archivo.*$|^.+Termino de procesar Transaction Report.*$"gm

demo2 演示2

I find simplicity to usually be the best solution. 我发现简单通常是最好的解决方案。 No need to explicitly select the datetime stuff or "ReporteClaro" simply use the catch all before to grab it. 无需显式选择日期时间内容或“ ReporteClaro”,只需在捕获之前使用全部捕获即可。 Easier to understand the regex imo. 更容易理解正则表达式imo。

Edit: You need the gm modifiers unless you're reading it line by line. 编辑:您需要gm修饰符,除非您逐行阅读它。

This will get them as a group, and everything in-between. 这将使他们成为一个整体,并且介于两者之间。

(?s)[\d]+-[\d]+-[\d]+[ ][\d]+:[\d]+:[\d]+[ ]INFO[ ]ReporteClaro:[\d]+[ ]-[ ]Termino[ ]de[ ]procesar[ ](?:archivo|Transaction[ ]Report)(?:.*?[\d]+-[\d]+-[\d]+[ ][\d]+:[\d]+:[\d]+[ ]INFO[ ]ReporteClaro:[\d]+[ ]-[ ]Termino[ ]de[ ]procesar[ ](?:archivo|Transaction[ ]Report))*  

Readable version 可读版本

 (?s)

 [\d]+ - [\d]+ - [\d]+ [ ] [\d]+ : [\d]+ : [\d]+ [ ] INFO [ ] ReporteClaro: 
 [\d]+ [ ] - [ ] Termino [ ] de [ ] procesar [ ] 
 (?: archivo | Transaction [ ] Report )

 (?:
      .*? [\d]+ - [\d]+ - [\d]+ [ ] [\d]+ : [\d]+ : [\d]+ [ ] INFO [ ] ReporteClaro: 
      [\d]+ [ ] - [ ] Termino [ ] de [ ] procesar [ ] 
      (?: archivo | Transaction [ ] Report )
 )*

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM