简体   繁体   中英

Python RegExp retrieve value from matching string

Deal all

I faced some not trivial problem for me to parse log.

I need to go through a file and check if the line matches the patter : if YES then get ClientID specified in this line.

The line looks like :

17.02.09 10:42:31.242 TRACE [1245] GDS:     someText(SomeText).ClientID: '' -> '99071901'

So I need to get 99071901.

I tried to construct regexp search pattern, but it is not complete..stuck at 'TRACE':

regex = '(^[(\d\.)]+) ([(\d\:)]+) ([\bTRACE\b]+) ([(\d)]+) ([\bGDS\b:)]+) ([\ClientID\b])'

Script code is :

log=open('t.log','r')
for i in log:
    key=re.search(regex,i)
    print(key.group()) #print string matching 
    for g in key:
        client_id=re.seach(????,g) # find ClientIt    
log.close()

Appreciate if you give me a hint how to solve this challenge.

Thank you.

You don't need to be too specific. You can just capture the sections and parse them individually.

Lets start with just your one line for example:

line = "17.02.09 10:42:31.242 TRACE [1245] GDS:     someText(SomeText).ClientID: '' -> '99071901'"

And then lets add our first regex that gets all the sections:

import re
line_regex = re.compile(r'(\S+)\s+(\S+)\s+(\S+)\s+(\S+)\s+(\S+):\s+(.+)')
# now extract each section
date, time, level, thread, module, message = line_regex.match(line).groups()

Now, if we look at the different sections they will have all the information we need to make more decisions, or further parse them. Now lets get the client ID when the right kind of message shows up.

client_id_regex = re.compile(r".*ClientID: '' -> '(\d+)'")

if 'ClientID' in message:
    client_id = client_id_regex.match(message).group(1)

And now we have the client_id .


Just work that logic into your loop and you are all set.

line_regex = re.compile(r'(\S+)\s+(\S+)\s+(\S+)\s+(\S+)\s+(\S+):\s+(.+)')
client_id_regex = re.compile(r".*ClientID: '' -> '(\d+)'")

with open('t.log','r') as f:  # use with context manager to auto close the file
    for line in f:  # lets iterate over the lines
        sections = line_regex.match(line)  # make a match object for sections
        if not sections:
            continue  # probably you want to handle this case
        date, time, level, thread, module, message = sections.groups()
        if 'ClientID' in message:  # should we even look here for a client id?
            client_id = client_id_regex.match(message).group(1)
# now do what you wanted to do

You may use capturing parentheses around those parts in the pattern that you are interested in, and then access those parts using group(n) where n is the corresponding group ID:

import re
s = "17.02.09 10:42:31.242 TRACE [1245] GDS:     someText(SomeText).ClientID: '' -> '99071901'"
regex = r"^([\d.]+)\s+([\d.:]+)\s+(TRACE)\s+\[(\d+)] GDS:.*?ClientID:\s*''\s*->\s*'(\d+)'$"
m = re.search(regex, s)
if m:
    print(m.group(1))
    print(m.group(2))
    print(m.group(3))
    print(m.group(4))
    print(m.group(5))

See the Python online demo

The pattern is

^([\d.]+)\s+([\d.:]+)\s+(TRACE)\s+\[(\d+)] GDS:.*?ClientID:\s*''\s*->\s*'(\d+)'$

See its online demo here .

Note that you have messed the character classes with groups: (...) groups subpatterns and captures them while [...] defines character classes that match single characters.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM