>>> src = ' pkg.subpkg.submod.thing pkg2.subpkg.submod.thing '
>>> re.search(r'\s*(\w+\.)+', src).groups()
('submod.',)
This regex seems to put everything which is not space into a/the group - nothing to be lost before stop of regex match.
Why is just the last "+" repetition found in the group here - and not ('pkg.subpkg.submod.',)
?
Or ('pkg.',)
- early stop because no real repetition - no "loss of information" in another sense?
(I needed to use another (?:...)
like r'\\s((?:\\w+\\.)+)'
)
Even more strange:
>>> src = ' pkg.subpkg.submod.thing pkg2.subpkg.submod.thing '
>>> re.search(r'\s(\w+\.)*', src).groups()
(None,)
Edit: the "more strange" is actually "less strange" as @Avinash Raj pointed out, because - unlike intended - the match simply ends before the group; So
>>> re.search(r'\s+(\w+\.)*', ' pkg.subpkg.submod.thing').groups()
('submod.',)
.. then produces the same questioned behavior than "+" : just last repetition - things before seeming lost...
I'll explain the even more strange part..
src = ' pkg.subpkg.submod.thing pkg2.subpkg.submod.thing '
re.search
stops matching once it finds a first match. So,
r'\\s(\\w+\\.)*'
would match the first space character ( *
repeats the previous pattern zero or more times ), since there is no match for (\\w+\\.)*
after the first space, groups()
function on searchObj returns None
and group
on searchObj should return the space that is the first space.
I do not know, why it is strange for you. What do you expect?
In the documentation you find the following:
re.search(pattern, string, flags=0) Scan through string looking for the first location where the regular expression pattern ...
re.search(r'\s*(\w+\.)+', src).groups()
in your search string you have only one group: (\\w+.) Because it is greedy by default all the pkg.subpkg. is eaten before you find submod. , this is the last that is filled, that the string matches.
your second try doesn't match, cause there is not even 1 group nessesary to fulfil the Statement, so all 3 parts are eaten and inside the Group you find nothing.
Do you look for this?
re.search(r'\s*((\w+\.)+)', src).groups()[0]
Try out the following to understand it better:
re.search(r'\s*((\w+\.)*)(\w+\.)*', 'a.b.c.d.e.f.g.h.i').groups()
This should work fine to match the complete string ' pkg.subpkg.submod.thing pkg2.subpkg.submod.thing '
(\s*(\w+[.\s])+)+
In case you want the output ' pkg.subpkg.submod.thing ' then use this
\s*(\w+[.\s])+
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.