简体   繁体   中英

Python and regex: split a string with parenthesis

In a log file I've got the following format on each line:

[date] [thread] [loglevel] [class] some text describing the event that happened.

I'd like to iterate through the logs and split the strings so that I have the following: ['date','thread','loglevel','class','some text describing the event that happened.']

I'm pretty sure that I need to use re.split to do this but my regex is awful.

Trying something like this:

  for line in open(sys.argv[1]).xreadlines():
    parts = re.split(r'[[]]',line)

Any help is appreciated!

Try this:

>>> log = '[date] [thread] [loglevel] [class] some text describing the event that happened.'
>>> [part.strip() for part in re.split('[\[\]]', log) if part.strip()]
['date', 'thread', 'loglevel', 'class', 'some text describing the event that happened.']

the string is split when it sees a [ or ]. In the pattern for re.split, you will need to escape these characters. I added the part.strip() and if part.strip() to remove unwanted whitspaces and empty strings

First, \\[(.*?)\\] will match anything in brackets.

So, if you want to do that four times:

r = r'\[(.*?)\].*?' * 4
date, thread, loglevel, class = re.match(r, log).groups()

And, to get the remainder:

r = r'\[(.*?)\].*?' * 4 + r'(.*)'    
date, thread, loglevel, class, text = re.match(r, log).groups()

Or, if you prefer to write it out explicitly:

r = r'\[(.*?)\].*?\[(.*?)\].*?\[(.*?)\].*?\[(.*?)\].*?(.*)'

… but personally, I find that way gives me headaches.


But if you're having a hard time with the regexps, it might be easier to simplify things. For example…

First, find everything between brackets:

date, thread, loglevel, class = re.findall(r'\[(.+?)\]', log)

Then find everything after the last brackets:

text = log.rpartition(']')[-1].lstrip()

It's obviously more verbose than a single regex solution would be, and it's probably slower as well, but if you can understand it and maintain it yourself, that's worth a lot more in the long run.

You could try to match the strings instead of splitting it.

>>> import re
>>> s = "[date] [thread] [loglevel] [class] some text describing the event that happened."
>>> m = re.findall(r'(?<=\[)[^]]*|(?<=]\s)[^\]\[]+', s)
>>> m
['date', 'thread', 'loglevel', 'class', 'some text describing the event that happened.']
\]\s\[|\]\s(?=\w)|^\[

You can try this regex.

See demo.

http://regex101.com/r/lU7jH1/2

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM