简体   繁体   English

在正则表达式匹配后取第一个单词

[英]Take first word after a regex match

I am trying to extract some substring using regex from a string.我正在尝试使用正则表达式从字符串中提取一些子字符串。 I have as a parameter a word in my function, and the goal is to extract the very next word(my definition of word) after this match.我的函数中有一个词作为参数,目标是在匹配后提取下一个词(我对词的定义)。 I have tried lookbehind and some other logics, but I failed to obtain the results so any help is welcome.我试过后视和其他一些逻辑,但我没有得到结果,所以欢迎任何帮助。

As example, given the first case, I have as input in my function: **THttpServer**例如,在第一种情况下,我在我的函数中输入: **THttpServer**

23:25:04.805: INFO: THttpServer: transportTCPChanged(state: DISCONNECTED 2)
23:25:13.120: INFO: THttpServer: transportUDPOpened(state: Port 54)

Expected result: transportTCPChanged and transportUDPOpened for both cases.预期结果:两种情况下的transportTCPChangedtransportUDPOpened

Another case, I have as input CurrentUserConnection另一种情况,我有作为输入CurrentUserConnection

23:25:16.622: INFO: CurrentUserConnection#1:RQ : subscribed(userID: 1)
23:25:16.622: INFO: CurrentUserConnection#8:RP : disconnected

Expected result: subscribed, disconnected .预期结果:已subscribed, disconnected

Things I have tried (the lookbehind changes depending on the example) on Notepad++:我在 Notepad++ 上尝试过的事情(后视变化取决于示例):

(?<=THttpServer)(\\w+) : No matches (?<=THttpServer)(.*) : Obviously returns all the sentence, not expected match (?<=THttpServer)(\\w+) : 无匹配(?<=THttpServer)(.*) : 显然返回所有句子,不是预期的匹配

I am bit confused, maybe it's not even possible?我有点困惑,也许这甚至不可能? Or do I need some pre-processing?还是我需要一些预处理?

You need to match : after THttpServer and any non-word chars up to the word and match and capture it with (\\w+) .您需要匹配:THttpServer之后和任何非单词字符直到单词并使用(\\w+)匹配和捕获它。

Eg you may use例如你可以使用

THttpServer:\W*(\w+)

See the regex demo .请参阅正则表达式演示

Details细节

  • THttpServer: - a literal substring THttpServer: - 文字子串
  • \\W* - any 0+ non-word chars \\W* - 任何 0+ 个非单词字符
  • (\\w+) - Capturing group 1 (later accessible via m.group(1) ): 1 or more word chars. (\\w+) - 捕获组 1(稍后可通过m.group(1)访问):1 个或多个字字符。

See the Python demo :请参阅Python 演示

import re
strs = ['23:25:04.805: INFO: THttpServer: transportTCPChanged(state: DISCONNECTED 2)',
        '23:25:13.120: INFO: THttpServer: transportUDPOpened(state: Port 54)']

rx = re.compile(r'THttpServer:\W*(\w+)')
for s in strs:
    m = rx.search(s)
    if m:
        print("Found '{}' in '{}'.".format(m.group(1), s))

Output:输出:

Found 'transportTCPChanged' in '23:25:04.805: INFO: THttpServer: transportTCPChanged(state: DISCONNECTED 2)'.
Found 'transportUDPOpened' in '23:25:13.120: INFO: THttpServer: transportUDPOpened(state: Port 54)'.

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM