[英]Finding the word immediately after a character with regular expression
I am trying to look for the word that is immediately after '%' in the following line:我正在尝试在以下行中查找紧跟在 '%' 之后的单词:
RP/0/RP0/CPU0:Feb 26 20:04:01.869 UTC: esd[361]: %PKT_INFRA-FM-3-FAULT_MAJOR : ALARM_MAJOR :SWITCH_LINK_ERR_E :DECLARE :0/RP0/CPU0/7:
LC/0/9/CPU0:Feb 26 20:00:25.560 UTC: npu_drvr[253]: %PLATFORM-OFA-6-INFO : NPU #1 Initialization Completed
To start, I used the following Python code, and it is working.首先,我使用了以下 Python 代码,它正在运行。
result = re.search(r"\%.* \: ", txt)
result.group()
And here is the result:而这里是结果:
However, my reg ex fails in lines like this:但是,我的 reg ex 在这样的行中失败:
LC/0/9/CPU0:Feb 27 15:33:58.509 UTC: npu_drvr[253]: %FABRIC-NPU_DRVR-1-PACIFIC_ERROR : [5821] : [PACIFIC A0]: For asic 0 : A0 Errata: Observed RX CODE errors on link 120 , This is expected if you have A0 asic versions in the system and do triggers like OIR, reload etc.
Repetitions ( *
and +
) in regular expressions default to "greedy" mode: they try to match the longest piece of text.正则表达式中的重复( *
和+
)默认为“贪婪”模式:它们尝试匹配最长的文本段。 In the failure case you provided, there are additional colons ( :
) in the message after the word to match, so the greedy star *
matched them all.在您提供的失败案例中,消息中要匹配的单词后面还有额外的冒号 ( :
),因此贪婪之星*
将它们全部匹配。
You can change the behavior to "lazy" (or "non-greedy") by adding a question mark ( ?
) after the repetition, changing it to:您可以通过在重复后添加问号 ( ?
) 将行为更改为“懒惰”(或“非贪婪”),将其更改为:
result = re.search(r"\%.*? \: ", txt)
Check out the results here .在此处查看结果。 For more information, consider reading this article .有关更多信息,请考虑阅读本文。
What you want is a percent sign followed by one or more non-spaces:你想要的是一个百分号后跟一个或多个非空格:
re.search("%\S+", s)
#<_sre.SRE_Match object; span=(52, 84), match='%FABRIC-NPU_DRVR-1-PACIFIC_ERROR'>
you could use:你可以使用:
re.search(r'%([^\s]+)', s).group(1)
output (tested against the line for which your regex fails):输出(针对您的正则表达式失败的行进行测试):
FABRIC-NPU_DRVR-1-PACIFIC_ERROR
or you can use:或者你可以使用:
re.search(r'%(\S+)', s).group(1) # \S is the same with [^\s]
Try:尝试:
import re
x="LC/0/9/CPU0:Feb 27 15:33:58.509 UTC: npu_drvr[253]: %FABRIC-NPU_DRVR-1-PACIFIC_ERROR : [5821] : [PACIFIC A0]: For asic 0 : A0 Errata: Observed RX CODE errors on link 120 , This is expected if you have A0 asic versions in the system and do triggers like OIR, reload etc."
res=re.findall(r"(?<=%)[^\s]+", x)
Outputs:输出:
>>> res
['FABRIC-NPU_DRVR-1-PACIFIC_ERROR']
(?<=%)[^\\s]+
- first brackets will be a match only if %
is preceding the second brackets, without actually returning %
. (?<=%)[^\\s]+
- 仅当%
位于第二个括号之前时,第一个括号才会匹配,而不实际返回%
。 Next brackets are a match only for the word - meaning string of 1, or more characters, that aren't white space.下一个括号仅匹配单词 - 表示 1 个或多个字符的字符串,不是空格。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.