I have a string
string ='((clearance) AND (embedded) AND (software engineer OR developer)) AND (embedded)'
I want to break into lists based on the parenthesis, so referring solutions given I have used
my_data = re.findall(r"(\(.*?\))",string)
but when I print my_data, the output is (len = 4)
['((clearance)', '(embedded)', '(software engineer OR developer)', '(embedded)']
but my desired output is (len = 2)
['(clearance) AND (embedded) AND (software engineer OR developer)', '(embedded)']
because "(clearance) AND (embedded) AND (software engineer OR developer)" is in one parenthesis and "embedded" is in another parenthesis. but the "re.findall" is breaking in 4 lists, why?
If I want my desired output, how to modify the regular expression?
In pure regex, this would not be possible, so here is an idea that counts parenthesis:
def find_stuff(string):
indices = []
counter = 0
change = {"(":1, ")":-1}
for i, el in enumerate(string):
new_count = counter + change.get(el, 0)
if counter==0 and new_count==1:
indices.append(i)
elif counter==1 and new_count==0:
indices.append(i+1)
counter = new_count
return indices
This is not very beautiful, but I think the concept is clear. It returns the indices of outer parenthesis, so you can just slice your string with these
A bit of an re
hack, but this is possible:
>>> string ='((clearance) AND (embedded) AND (software engineer OR developer)) AND (embedded)'
>>> [e for e in re.split(r'\((?=\()(.*?)(?<=\))\)|(?<!\()(\([^()]+\))(?!\))',string) if e and '(' in e and ')' in e]
['(clearance) AND (embedded) AND (software engineer OR developer)', '(embedded)']
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.