Python 正则表达式如何找到以给定单词开头并以两个单词之一结尾的 substring

Question

我正在解析一个日志文件并试图找到一个 substring 以给定模式（比如Start log ）开始并以两种模式之一结束（比如exit code \d.或took \d* seconds. ），以较晚者为准.

我尝试了以下但没有成功：

block_regex1 = re.compile('Start log .*?(exit code \d.|took \d* seconds.)', re.DOTALL)
block_regex2 = re.compile('Start log .*? exit code \d.|Start log .*? took \d* seconds.)', re.DOTALL)

block_regex.findall(log)

示例日志文件：

Start log 1
doing stuff
Finished with exit code 1.
Start log 2
doing stuff
Finished with exit code 0.
log 2 took 12 seconds.
Start log 3
doing stuff
Finished with exit code 0.
log 3 took 10 seconds.
Start log 4
doing stuff
Finished with exit code 1.

使用上面的代码，它应该 output 一个列表：

Start log 1 doing stuff Finished with exit code 1.
Start log 2 doing stuff Finished with exit code 0. log 2 took 12 seconds.
...

最终，我想获取日志 ID、退出代码以及以秒为单位的时间（如果存在）。 我想我可以使用组来实现这一点，但仍在研究它。

Answer 1

利用

Start log (?:(?!Start log).)*(?:exit code \d+|took \d* seconds)\.

见证明。 简而言之：匹配尽可能多的文本从Start log直到exit code或在不允许Start log之间took xxx second 。

解释

--------------------------------------------------------------------------------
  Start log                'Start log '
--------------------------------------------------------------------------------
  (?:                      group, but do not capture (0 or more times
                           (matching the most amount possible)):
--------------------------------------------------------------------------------
    (?!                      look ahead to see if there is not:
--------------------------------------------------------------------------------
      Start log                'Start log'
--------------------------------------------------------------------------------
    )                        end of look-ahead
--------------------------------------------------------------------------------
    .                        any character except \n
--------------------------------------------------------------------------------
  )*                       end of grouping
--------------------------------------------------------------------------------
  (?:                      group, but do not capture:
--------------------------------------------------------------------------------
    exit code                'exit code '
--------------------------------------------------------------------------------
    \d+                      digits (0-9) (1 or more times (matching
                             the most amount possible))
--------------------------------------------------------------------------------
   |                        OR
--------------------------------------------------------------------------------
    took                     'took '
--------------------------------------------------------------------------------
    \d*                      digits (0-9) (0 or more times (matching
                             the most amount possible))
--------------------------------------------------------------------------------
     seconds                 ' seconds'
--------------------------------------------------------------------------------
  )                        end of grouping
--------------------------------------------------------------------------------
  \.                       '.'

Python 正则表达式如何找到以给定单词开头并以两个单词之一结尾的 substring

问题描述

1 个解决方案

解决方案1
1 已采纳 2021-05-14 20:35:27

Python 正则表达式如何找到以给定单词开头并以两个单词之一结尾的 substring

问题描述

1 个解决方案

解决方案1 1 已采纳 2021-05-14 20:35:27

解决方案1
1 已采纳 2021-05-14 20:35:27