Python Regex pattern findall

Question

The reg expression below

get_tags = lambda t: re.findall(r"<(.+)>", t)
st = "xyx<ab>xy x<bc> xyx<cd>xyxy xx<de> xyx <ef>x y<fg><gh>y"

print(get_tags(st))

expected output was

['ab', 'bc', 'cd', 'de', 'ef', 'fg', 'gh']

even though the pattern is not greedy (no '*' used?), the expression gives the output

['a>xyx<b>xyx<c>xyxyxx<d>xyx<e>xy<f><g']

What is the problem in the pattern?

Answer 1

.+ is greedy by default. You need to add ? reluctant quantifier next to the + to do a non-greedy match.

get_tags = lambda t: re.findall(r"<(.+?)>", t)

OR

get_tags = lambda t: re.findall(r"<([^<>]+)>", t)

[^<>]+ negated character class which matches any character but not of > or < one or more times.

>>> get_tags = lambda t: re.findall(r"<(.+?)>", t)
>>> st = "xyx<ab>xy x<bc> xyx<cd>xyxy xx<de> xyx <ef>x y<fg><gh>y"
>>> print(get_tags(st))
['ab', 'bc', 'cd', 'de', 'ef', 'fg', 'gh']
>>> get_tags = lambda t: re.findall(r"<([^<>]+)>", t)
>>> print(get_tags(st))
['ab', 'bc', 'cd', 'de', 'ef', 'fg', 'gh']

Answer 2

Since you know to find only letters between < > you could also use

get_tags = lambda t: re.findall(r"<(\w+)>", t)

as regex. that would only search for [A-Za-z] between < > and since there are spaces an some different between the brackets in your example. this would also work.

Python Regex pattern findall

Question

2 answers

solution1
2 ACCPTED 2014-12-05 11:23:03

solution2
0 2014-12-05 12:24:38

Python Regex pattern findall

Question

2 answers

solution1 2 ACCPTED 2014-12-05 11:23:03

solution2 0 2014-12-05 12:24:38

solution1
2 ACCPTED 2014-12-05 11:23:03

solution2
0 2014-12-05 12:24:38