Python - odd regex matching with + / * on group

Question

>>> src = '  pkg.subpkg.submod.thing  pkg2.subpkg.submod.thing  '
>>> re.search(r'\s*(\w+\.)+', src).groups()
('submod.',)

This regex seems to put everything which is not space into a/the group - nothing to be lost before stop of regex match.

Why is just the last "+" repetition found in the group here - and not ('pkg.subpkg.submod.',) ?

Or ('pkg.',) - early stop because no real repetition - no "loss of information" in another sense?

(I needed to use another (?:...) like r'\\s((?:\\w+\\.)+)' )

Even more strange:

>>> src = '  pkg.subpkg.submod.thing  pkg2.subpkg.submod.thing  '
>>> re.search(r'\s(\w+\.)*', src).groups()
(None,)

Edit: the "more strange" is actually "less strange" as @Avinash Raj pointed out, because - unlike intended - the match simply ends before the group; So

>>> re.search(r'\s+(\w+\.)*', '  pkg.subpkg.submod.thing').groups()
('submod.',)

.. then produces the same questioned behavior than "+" : just last repetition - things before seeming lost...

Answer 1

I'll explain the even more strange part..

src = '  pkg.subpkg.submod.thing  pkg2.subpkg.submod.thing  '

re.search stops matching once it finds a first match. So,

r'\\s(\\w+\\.)*' would match the first space character ( * repeats the previous pattern zero or more times ), since there is no match for (\\w+\\.)* after the first space, groups() function on searchObj returns None and group on searchObj should return the space that is the first space.

Answer 2

I do not know, why it is strange for you. What do you expect?

In the documentation you find the following:

re.search(pattern, string, flags=0) Scan through string looking for the first location where the regular expression pattern ...

re.search(r'\s*(\w+\.)+', src).groups()

in your search string you have only one group: (\\w+.) Because it is greedy by default all the pkg.subpkg. is eaten before you find submod. , this is the last that is filled, that the string matches.

your second try doesn't match, cause there is not even 1 group nessesary to fulfil the Statement, so all 3 parts are eaten and inside the Group you find nothing.

Do you look for this?

re.search(r'\s*((\w+\.)+)', src).groups()[0]

Try out the following to understand it better:

re.search(r'\s*((\w+\.)*)(\w+\.)*', 'a.b.c.d.e.f.g.h.i').groups()

Answer 3

This should work fine to match the complete string ' pkg.subpkg.submod.thing pkg2.subpkg.submod.thing '

(\s*(\w+[.\s])+)+

In case you want the output ' pkg.subpkg.submod.thing ' then use this

\s*(\w+[.\s])+

Python - odd regex matching with + / * on group

Question

3 answers

solution1
1 2017-04-27 10:29:35

solution2
0 2017-04-27 10:42:41

solution3
-1 2017-04-27 10:27:18

Python - odd regex matching with + / * on group

Question

3 answers

solution1 1 2017-04-27 10:29:35

solution2 0 2017-04-27 10:42:41

solution3 -1 2017-04-27 10:27:18

solution1
1 2017-04-27 10:29:35

solution2
0 2017-04-27 10:42:41

solution3
-1 2017-04-27 10:27:18