I am trying to parse IRC logs, like this one :
2013-09-26T01:52:40 <Shan-x> some stuff
I want to have the pseudo, so I use re
:
re.search('%s(.*)%s' % ('<', '>'), s).group(1)
But if the log is like this :
2013-09-26T01:52:40 <Shan-x> some stuff > foo bar
Then, I obtain this : Shan-x> some stuff
. How can I parse to have only the pseudo ?
You need to make the .*
non greedy by adding a ?
to the *
quantifier:
re.search('%s(.*?)%s' % ('<', '>'), s).group(1)
Now the .
matches the minimum number of characters that satisfies the pattern, rather than the default maximum.
Not sure why you use string interpolation here, though; for static characters, just use:
re.search('<(.*?)>', s).group(1)
You could also capture all characters that do not match the end character:
re.search('<([^>]*)>', s).group(1)
Here [^>]
forms a character class matching any character that is not in the class; so any character that is not >
would qualify.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.