I'm trying to find the text between the parenthesis , but I want something like this
s="( abc (def) kkk ( mno) sdd ( xyz ) )"
p=re.findall(r"\(.*?\)",s)
for i in p:
print(i)
Output:
( abc (def) ,
( mno),
( xyz )
Expected:
( abc (def) ,
( abc (def) kkk ( mno) ,
( abc (def) kkk ( mno) sdd ( xyz ) ,
( abc (def) kkk ( mno) sdd ( xyz ) ) ,
(def) ,
(def) kkk ( mno) ,
(def) kkk ( mno) sdd ( xyz ) ,
(def) kkk ( mno) sdd ( xyz ) ) ,
( mno) ,
( mno) sdd ( xyz ) ,
( mno) sdd ( xyz ) ) ,
( xyz ) ,
( xyz ) )
The python regex module does not handle overlapping matches. It is easier to get by finding the positions of (
and )
in your text, creating sensible tuples for start/end values and slice your string:
Using enumerate(iterable) , collections.defaultdict() and itertools.product() :
s="( abc (def) kkk ( mno) sdd ( xyz ) )"
# get positions of all opening and closing ()
from collections import defaultdict
d = defaultdict(list)
print(d)
for idx,c in enumerate(s):
if c in "()":
d[c].append(idx)
# combine all positions
from itertools import product
pos = list(product (d["("],d[")"]))
print(pos)
# slice the text if start < stop+1 else skip
for start,stop in pos:
if start < stop+1:
print(s[start:stop+1])
Output:
# d
defaultdict(<class 'list'>, {'(': [0, 6, 16, 27], ')': [10, 21, 33, 35]})
# pos
[(0, 10), (0, 21), (0, 33), (0, 35), (6, 10), (6, 21), (6, 33), (6, 35),
(16, 10), (16, 21), (16, 33), (16, 35), (27, 10), (27, 21), (27, 33), (27, 35)]
# texts from pos
( abc (def)
( abc (def) kkk ( mno)
( abc (def) kkk ( mno) sdd ( xyz )
( abc (def) kkk ( mno) sdd ( xyz ) )
(def)
(def) kkk ( mno)
(def) kkk ( mno) sdd ( xyz )
(def) kkk ( mno) sdd ( xyz ) )
( mno)
( mno) sdd ( xyz )
( mno) sdd ( xyz ) )
( xyz )
( xyz ) )
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.