In a log file I've got the following format on each line:
[date] [thread] [loglevel] [class] some text describing the event that happened.
I'd like to iterate through the logs and split the strings so that I have the following: ['date','thread','loglevel','class','some text describing the event that happened.']
I'm pretty sure that I need to use re.split to do this but my regex is awful.
Trying something like this:
for line in open(sys.argv[1]).xreadlines():
parts = re.split(r'[[]]',line)
Any help is appreciated!
Try this:
>>> log = '[date] [thread] [loglevel] [class] some text describing the event that happened.'
>>> [part.strip() for part in re.split('[\[\]]', log) if part.strip()]
['date', 'thread', 'loglevel', 'class', 'some text describing the event that happened.']
the string is split when it sees a [ or ]. In the pattern for re.split, you will need to escape these characters. I added the part.strip() and if part.strip() to remove unwanted whitspaces and empty strings
First, \\[(.*?)\\]
will match anything in brackets.
So, if you want to do that four times:
r = r'\[(.*?)\].*?' * 4
date, thread, loglevel, class = re.match(r, log).groups()
And, to get the remainder:
r = r'\[(.*?)\].*?' * 4 + r'(.*)'
date, thread, loglevel, class, text = re.match(r, log).groups()
Or, if you prefer to write it out explicitly:
r = r'\[(.*?)\].*?\[(.*?)\].*?\[(.*?)\].*?\[(.*?)\].*?(.*)'
… but personally, I find that way gives me headaches.
But if you're having a hard time with the regexps, it might be easier to simplify things. For example…
First, find everything between brackets:
date, thread, loglevel, class = re.findall(r'\[(.+?)\]', log)
Then find everything after the last brackets:
text = log.rpartition(']')[-1].lstrip()
It's obviously more verbose than a single regex solution would be, and it's probably slower as well, but if you can understand it and maintain it yourself, that's worth a lot more in the long run.
You could try to match the strings instead of splitting it.
>>> import re
>>> s = "[date] [thread] [loglevel] [class] some text describing the event that happened."
>>> m = re.findall(r'(?<=\[)[^]]*|(?<=]\s)[^\]\[]+', s)
>>> m
['date', 'thread', 'loglevel', 'class', 'some text describing the event that happened.']
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.