Tinier string between two substrings

Question

I am trying to parse IRC logs, like this one :

2013-09-26T01:52:40  <Shan-x> some stuff

I want to have the pseudo, so I use re :

re.search('%s(.*)%s' % ('<', '>'), s).group(1)

But if the log is like this :

2013-09-26T01:52:40  <Shan-x> some stuff > foo bar

Then, I obtain this : Shan-x> some stuff . How can I parse to have only the pseudo ?

Answer 1

You need to make the .* non greedy by adding a ? to the * quantifier:

re.search('%s(.*?)%s' % ('<', '>'), s).group(1)

Now the . matches the minimum number of characters that satisfies the pattern, rather than the default maximum.

Not sure why you use string interpolation here, though; for static characters, just use:

re.search('<(.*?)>', s).group(1)

You could also capture all characters that do not match the end character:

re.search('<([^>]*)>', s).group(1)

Here [^>] forms a character class matching any character that is not in the class; so any character that is not > would qualify.