[英]Find and print the indexes of a substring in string that starts and ends with a specific character in python
[英]Python regex how to find a substring that starts with a given word and ends with either of two words
我正在解析一个日志文件并试图找到一个 substring 以给定模式(比如Start log
)开始并以两种模式之一结束(比如exit code \d.
或took \d* seconds.
),以较晚者为准.
我尝试了以下但没有成功:
block_regex1 = re.compile('Start log .*?(exit code \d.|took \d* seconds.)', re.DOTALL)
block_regex2 = re.compile('Start log .*? exit code \d.|Start log .*? took \d* seconds.)', re.DOTALL)
block_regex.findall(log)
示例日志文件:
Start log 1
doing stuff
Finished with exit code 1.
Start log 2
doing stuff
Finished with exit code 0.
log 2 took 12 seconds.
Start log 3
doing stuff
Finished with exit code 0.
log 3 took 10 seconds.
Start log 4
doing stuff
Finished with exit code 1.
使用上面的代码,它应该 output 一个列表:
Start log 1 doing stuff Finished with exit code 1.
Start log 2 doing stuff Finished with exit code 0. log 2 took 12 seconds.
最终,我想获取日志 ID、退出代码以及以秒为单位的时间(如果存在)。 我想我可以使用组来实现这一点,但仍在研究它。
利用
Start log (?:(?!Start log).)*(?:exit code \d+|took \d* seconds)\.
见证明。 简而言之:匹配尽可能多的文本从Start log
直到exit code
或在不允许Start log
之间took xxx second
。
解释
--------------------------------------------------------------------------------
Start log 'Start log '
--------------------------------------------------------------------------------
(?: group, but do not capture (0 or more times
(matching the most amount possible)):
--------------------------------------------------------------------------------
(?! look ahead to see if there is not:
--------------------------------------------------------------------------------
Start log 'Start log'
--------------------------------------------------------------------------------
) end of look-ahead
--------------------------------------------------------------------------------
. any character except \n
--------------------------------------------------------------------------------
)* end of grouping
--------------------------------------------------------------------------------
(?: group, but do not capture:
--------------------------------------------------------------------------------
exit code 'exit code '
--------------------------------------------------------------------------------
\d+ digits (0-9) (1 or more times (matching
the most amount possible))
--------------------------------------------------------------------------------
| OR
--------------------------------------------------------------------------------
took 'took '
--------------------------------------------------------------------------------
\d* digits (0-9) (0 or more times (matching
the most amount possible))
--------------------------------------------------------------------------------
seconds ' seconds'
--------------------------------------------------------------------------------
) end of grouping
--------------------------------------------------------------------------------
\. '.'
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.