繁体   English   中英

Python 正则表达式如何找到以给定单词开头并以两个单词之一结尾的 substring

[英]Python regex how to find a substring that starts with a given word and ends with either of two words

我正在解析一个日志文件并试图找到一个 substring 以给定模式(比如Start log )开始并以两种模式之一结束(比如exit code \d.took \d* seconds. ),以较晚者为准.

我尝试了以下但没有成功:

block_regex1 = re.compile('Start log .*?(exit code \d.|took \d* seconds.)', re.DOTALL)
block_regex2 = re.compile('Start log .*? exit code \d.|Start log .*? took \d* seconds.)', re.DOTALL)

block_regex.findall(log)

示例日志文件:

Start log 1
doing stuff
Finished with exit code 1.
Start log 2
doing stuff
Finished with exit code 0.
log 2 took 12 seconds.
Start log 3
doing stuff
Finished with exit code 0.
log 3 took 10 seconds.
Start log 4
doing stuff
Finished with exit code 1.

使用上面的代码,它应该 output 一个列表:

  • Start log 1 doing stuff Finished with exit code 1.
  • Start log 2 doing stuff Finished with exit code 0. log 2 took 12 seconds.
  • ...

最终,我想获取日志 ID、退出代码以及以秒为单位的时间(如果存在)。 我想我可以使用组来实现这一点,但仍在研究它。

利用

Start log (?:(?!Start log).)*(?:exit code \d+|took \d* seconds)\.

证明 简而言之:匹配尽可能多的文本从Start log直到exit code或在不允许Start log之间took xxx second

解释

--------------------------------------------------------------------------------
  Start log                'Start log '
--------------------------------------------------------------------------------
  (?:                      group, but do not capture (0 or more times
                           (matching the most amount possible)):
--------------------------------------------------------------------------------
    (?!                      look ahead to see if there is not:
--------------------------------------------------------------------------------
      Start log                'Start log'
--------------------------------------------------------------------------------
    )                        end of look-ahead
--------------------------------------------------------------------------------
    .                        any character except \n
--------------------------------------------------------------------------------
  )*                       end of grouping
--------------------------------------------------------------------------------
  (?:                      group, but do not capture:
--------------------------------------------------------------------------------
    exit code                'exit code '
--------------------------------------------------------------------------------
    \d+                      digits (0-9) (1 or more times (matching
                             the most amount possible))
--------------------------------------------------------------------------------
   |                        OR
--------------------------------------------------------------------------------
    took                     'took '
--------------------------------------------------------------------------------
    \d*                      digits (0-9) (0 or more times (matching
                             the most amount possible))
--------------------------------------------------------------------------------
     seconds                 ' seconds'
--------------------------------------------------------------------------------
  )                        end of grouping
--------------------------------------------------------------------------------
  \.                       '.'

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM