Python RegExp从匹配的字符串中检索值

Question

Deal all 全部交易

I faced some not trivial problem for me to parse log. 我在解析日志时遇到了一些不小的问题。

I need to go through a file and check if the line matches the patter : if YES then get ClientID specified in this line. 我需要检查一个文件，并检查该行是否与模式匹配：如果为YES，则获取此行中指定的ClientID。

The line looks like : 这行看起来像：

17.02.09 10:42:31.242 TRACE [1245] GDS:     someText(SomeText).ClientID: '' -> '99071901'

So I need to get 99071901. 所以我需要获得99071901。

I tried to construct regexp search pattern, but it is not complete..stuck at 'TRACE': 我试图构造正则表达式搜索模式，但是它不完整..卡在“ TRACE”上：

regex = '(^[(\d\.)]+) ([(\d\:)]+) ([\bTRACE\b]+) ([(\d)]+) ([\bGDS\b:)]+) ([\ClientID\b])'

Script code is : 脚本代码为：

log=open('t.log','r')
for i in log:
    key=re.search(regex,i)
    print(key.group()) #print string matching 
    for g in key:
        client_id=re.seach(????,g) # find ClientIt    
log.close()

Appreciate if you give me a hint how to solve this challenge. 如果您给我提示如何解决此挑战，请多加赞赏。

Thank you. 谢谢。

Answer 1

You don't need to be too specific. 您不需要太具体。 You can just capture the sections and parse them individually. 您可以只捕获各个部分并分别进行分析。

Lets start with just your one line for example: 让我们从您的一行开始，例如：

line = "17.02.09 10:42:31.242 TRACE [1245] GDS:     someText(SomeText).ClientID: '' -> '99071901'"

And then lets add our first regex that gets all the sections: 然后添加添加所有部分的第一个正则表达式：

import re
line_regex = re.compile(r'(\S+)\s+(\S+)\s+(\S+)\s+(\S+)\s+(\S+):\s+(.+)')
# now extract each section
date, time, level, thread, module, message = line_regex.match(line).groups()

Now, if we look at the different sections they will have all the information we need to make more decisions, or further parse them. 现在，如果我们查看不同的部分，他们将拥有做出更多决策或进一步解析它们所需的所有信息。 Now lets get the client ID when the right kind of message shows up. 现在，当显示正确的消息时，让我们获取客户端ID。

client_id_regex = re.compile(r".*ClientID: '' -> '(\d+)'")

if 'ClientID' in message:
    client_id = client_id_regex.match(message).group(1)

And now we have the client_id . 现在我们有了client_id 。

Just work that logic into your loop and you are all set. 只要将逻辑工作到循环中，便一切就绪。

line_regex = re.compile(r'(\S+)\s+(\S+)\s+(\S+)\s+(\S+)\s+(\S+):\s+(.+)')
client_id_regex = re.compile(r".*ClientID: '' -> '(\d+)'")

with open('t.log','r') as f:  # use with context manager to auto close the file
    for line in f:  # lets iterate over the lines
        sections = line_regex.match(line)  # make a match object for sections
        if not sections:
            continue  # probably you want to handle this case
        date, time, level, thread, module, message = sections.groups()
        if 'ClientID' in message:  # should we even look here for a client id?
            client_id = client_id_regex.match(message).group(1)
# now do what you wanted to do

Answer 2

You may use capturing parentheses around those parts in the pattern that you are interested in, and then access those parts using group(n) where n is the corresponding group ID: 您可以在所需模式中的那些部分周围使用捕获括号，然后使用group(n)访问这些部分，其中n是相应的组ID：

import re
s = "17.02.09 10:42:31.242 TRACE [1245] GDS:     someText(SomeText).ClientID: '' -> '99071901'"
regex = r"^([\d.]+)\s+([\d.:]+)\s+(TRACE)\s+\[(\d+)] GDS:.*?ClientID:\s*''\s*->\s*'(\d+)'$"
m = re.search(regex, s)
if m:
    print(m.group(1))
    print(m.group(2))
    print(m.group(3))
    print(m.group(4))
    print(m.group(5))

See the Python online demo 观看Python在线演示

The pattern is 模式是

^([\d.]+)\s+([\d.:]+)\s+(TRACE)\s+\[(\d+)] GDS:.*?ClientID:\s*''\s*->\s*'(\d+)'$

See its online demo here . 在此处查看其在线演示。

Note that you have messed the character classes with groups: (...) groups subpatterns and captures them while [...] defines character classes that match single characters. 请注意，您已经将字符类与组弄乱了： (...)将子模式分组并捕获它们，而[...]定义了与单个字符匹配的字符类。

Python RegExp从匹配的字符串中检索值

问题描述

2 个解决方案

解决方案1
2 2017-02-13 11:09:18

解决方案2
1 已采纳 2017-02-13 12:13:42

Python RegExp从匹配的字符串中检索值

问题描述

2 个解决方案

解决方案1 2 2017-02-13 11:09:18

解决方案2 1 已采纳 2017-02-13 12:13:42

解决方案1
2 2017-02-13 11:09:18

解决方案2
1 已采纳 2017-02-13 12:13:42