I get some string like this: \\input{{whatever}{1}}\\mypath{{path1}{path2}{path3}...{pathn}}\\shape{{0.2}{0.3}}
I would like to capture all the paths: path1, path2, ... pathn. I tried the re
module in python. However, it does not support multiple capture. For example: r"\\\\mypath\\{(\\{[^\\{\\}\\[\\]]*\\})*\\}"
will only return the last matched group. Applying the pattern to search(r"\\mypath{{path1}{path2}})"
will only return groups()
as ("{path2}",)
Then I found an alternative way to do this:
gpathRegexPat=r"(?:\\mypath\{)((\{[^\{\}\[\]]*\})*)(?:\})"
gpathRegexCp=re.compile(gpathRegexPat)
strpath=gpathRegexCp.search(r'\mypath{{sadf}{ad}}').groups()[0]
>>> strpath
'{sadf}{ad}'
p=re.compile('\{([^\{\}\[\]]*)\}')
>>> p.findall(strpath)
['sadf', 'ad']
or:
>>> gpathRegexPat=r"\\mypath\{(\{[^{}[\]]*\})*\}"
>>> gpathRegexCp=re.compile(gpathRegexPat, flags=re.I|re.U)
>>> strpath=gpathRegexCp.search(r'\input{{whatever]{1}}\mypath{{sadf}{ad}}\shape{{0.2}{0.1}}').group()
>>> strpath
'\\mypath{{sadf}{ad}}'
>>> p.findall(strpath)
['sadf', 'ad']
At this point, I thought, why not just use the findall on the original string? I may use: gpathRegexPat=r"(?:\\\\mypath\\{)(?:\\{[^\\{\\}\\[\\]]*\\})*?\\{([^\\{\\}\\[\\]]*)\\}(?:\\{[^\\{\\}\\[\\]]*\\})*?(?:\\})"
: if the first (?:\\{[^\\{\\}\\[\\]]*\\})*?
matches 0 time and the 2nd (?:\\{[^\\{\\}\\[\\]]*\\})*?
matches 1 time, it will capture sadf
; if the first (?:\\{[^\\{\\}\\[\\]]*\\})*?
matches 1 time, the 2nd one matches 0 time, it will capture ad
. However, it will only return ['sadf']
with this regex.
With out all those extra patterns ( (?:\\\\mypath\\{)
and (?:\\})
), it actually works:
>>> p2=re.compile(r'(?:\{[^\{\}\[\]]*\})*?\{([^\{\}\[\]]*)\}(?:\{[^\{\}\[\]]*\})*?')
>>> p2.findall(strpath)
['sadf', 'ad']
>>> p2.findall('{adadd}{dfada}{adafadf}')
['adadd', 'dfada', 'adafadf']
Can anyone explain this behavior to me? Is there any smarter way to achieve the result I want?
You are right. It is not possible to return repeated subgroups inside a group. To do what you want, you can use a regular expression to capture the group and then use a second regular expression to capture the repeated subgroups.
In this case that would be something like: \\\\mypath{(?:\\{.*?\\})}
. This will return {path1}{path2}{path3}
Then to find the repeating patterns of {pathn}
inside that string, you can simply use \\{(.*?)\\}
. This will match anything withing the braces. The .*?
is a non-greedy version of .*
, meaning it will return the shortest possible match instead of the longest possible match.
re.findall("{([^{}]+)}",text)
should work
returns
['path1', 'path2', 'path3', 'pathn']
finally
my_path = r"\input{{whatever}{1}}\mypath{{path1}{path2}{path3}...{pathn}}\shape{{0.2}{0.3}}"
#get the \mypath part
my_path2 = [p for p in my_path.split("\\") if p.startswith("mypath")][0]
print re.findall("{([^{}]+)}",my_path2)
or even better
re.findall("{(path\d+)}",text) #will only return things like path<num> inside {}
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.