简体   繁体   English

使用正则表达式查找字符后紧跟的单词

[英]Finding the word immediately after a character with regular expression

I am trying to look for the word that is immediately after '%' in the following line:我正在尝试在以下行中查找紧跟在 '%' 之后的单词:

RP/0/RP0/CPU0:Feb 26 20:04:01.869 UTC: esd[361]: %PKT_INFRA-FM-3-FAULT_MAJOR : ALARM_MAJOR :SWITCH_LINK_ERR_E :DECLARE :0/RP0/CPU0/7:

LC/0/9/CPU0:Feb 26 20:00:25.560 UTC: npu_drvr[253]: %PLATFORM-OFA-6-INFO : NPU #1 Initialization Completed

To start, I used the following Python code, and it is working.首先,我使用了以下 Python 代码,它正在运行。

result = re.search(r"\%.* \: ", txt)
result.group()

And here is the result:这里是结果:

However, my reg ex fails in lines like this:但是,我的 reg ex 在这样的行中失败:

LC/0/9/CPU0:Feb 27 15:33:58.509 UTC: npu_drvr[253]: %FABRIC-NPU_DRVR-1-PACIFIC_ERROR : [5821] : [PACIFIC A0]: For asic 0 : A0 Errata: Observed RX CODE errors on link 120 , This is expected if you have A0 asic versions in the system and do triggers like OIR, reload etc.

Repetitions ( * and + ) in regular expressions default to "greedy" mode: they try to match the longest piece of text.正则表达式中的重复( *+ )默认为“贪婪”模式:它们尝试匹配最长的文本段。 In the failure case you provided, there are additional colons ( : ) in the message after the word to match, so the greedy star * matched them all.在您提供的失败案例中,消息中要匹配的单词后面还有额外的冒号 ( : ),因此贪婪之星*将它们全部匹配。

You can change the behavior to "lazy" (or "non-greedy") by adding a question mark ( ? ) after the repetition, changing it to:您可以通过在重复后添加问号 ( ? ) 将行为更改为“懒惰”(或“非贪婪”),将其更改为:

result = re.search(r"\%.*? \: ", txt)

Check out the results here .此处查看结果。 For more information, consider reading this article .有关更多信息,请考虑阅读本文

What you want is a percent sign followed by one or more non-spaces:你想要的是一个百分号后跟一个或多个非空格:

re.search("%\S+", s)
#<_sre.SRE_Match object; span=(52, 84), match='%FABRIC-NPU_DRVR-1-PACIFIC_ERROR'>

you could use:你可以使用:

re.search(r'%([^\s]+)', s).group(1)

output (tested against the line for which your regex fails):输出(针对您的正则表达式失败的行进行测试):

FABRIC-NPU_DRVR-1-PACIFIC_ERROR

or you can use:或者你可以使用:

 re.search(r'%(\S+)', s).group(1) # \S is the same with [^\s]

Try:尝试:

import re

x="LC/0/9/CPU0:Feb 27 15:33:58.509 UTC: npu_drvr[253]: %FABRIC-NPU_DRVR-1-PACIFIC_ERROR : [5821] : [PACIFIC A0]: For asic 0 : A0 Errata: Observed RX CODE errors on link 120 , This is expected if you have A0 asic versions in the system and do triggers like OIR, reload etc."

res=re.findall(r"(?<=%)[^\s]+", x)

Outputs:输出:

>>> res

['FABRIC-NPU_DRVR-1-PACIFIC_ERROR']

(?<=%)[^\\s]+ - first brackets will be a match only if % is preceding the second brackets, without actually returning % . (?<=%)[^\\s]+ - 仅当%位于第二个括号之前时,第一个括号才会匹配,而不实际返回% Next brackets are a match only for the word - meaning string of 1, or more characters, that aren't white space.下一个括号仅匹配单词 - 表示 1 个或多个字符的字符串,不是空格。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM