python re.findall weird behaviour

Question

>>> text =\
... """xyxyxy testmatch0
... xyxyxy testmatch1
... xyxyxy
... whyisthismatched1
... xyxyxy testmatch2
...  xyxyxy testmatch3
... xyxyxy
... whyisthismatched2
... """
>>> re.findall("^\s*xyxyxy\s+([a-z0-9]+).*$", text, re.MULTILINE)
[u'testmatch0', u'testmatch1', u'whyisthismatched1', u'testmatch2', u'testmatch3', u'whyisthismatched2']

So my expectations would be to not match the lines containing "whyisthismatched".

The Python re documentation states the following:

(Dot.) In the default mode, this matches any character except a newline. If the DOTALL flag has been specified, this matches any character including a newline.

My question would be if this is really the expected behaviour or a bug. If it is expected someone please explain why those lines are matching and how I should modify my pattern to get the behaviour I expect:

[u'testmatch0', u'testmatch1', u'testmatch2', u'testmatch3']

Answer 1

Newlines are whitespace too as far as the \\s character class is concerned. If you want to match spaces only you need to match [ ] instead:

>>> re.findall("^\s*xyxyxy[ ]+([a-z0-9]+).*$", text, re.MULTILINE)
[u'testmatch0', u'testmatch1', u'testmatch2', u'testmatch3']

python re.findall weird behaviour

Question

1 answers

solution1
6 ACCPTED 2013-04-09 16:37:12

python re.findall weird behaviour

Question

1 answers

solution1 6 ACCPTED 2013-04-09 16:37:12

solution1
6 ACCPTED 2013-04-09 16:37:12