Error in regular expression to find the text between parenthesis

Question

I have a string

string  ='((clearance) AND (embedded) AND (software engineer OR developer)) AND (embedded)'

I want to break into lists based on the parenthesis, so referring solutions given I have used

my_data = re.findall(r"(\(.*?\))",string)

but when I print my_data, the output is (len = 4)

['((clearance)', '(embedded)', '(software engineer OR developer)', '(embedded)']

but my desired output is (len = 2)

['(clearance) AND (embedded) AND (software engineer OR developer)', '(embedded)']

because "(clearance) AND (embedded) AND (software engineer OR developer)" is in one parenthesis and "embedded" is in another parenthesis. but the "re.findall" is breaking in 4 lists, why?

If I want my desired output, how to modify the regular expression?

Answer 1

In pure regex, this would not be possible, so here is an idea that counts parenthesis:

def find_stuff(string):
    indices = []
    counter = 0
    change = {"(":1, ")":-1}
    for i, el in enumerate(string):
        new_count = counter + change.get(el, 0)
        if counter==0 and new_count==1:
            indices.append(i)
        elif counter==1 and new_count==0:
            indices.append(i+1)
        counter = new_count
    return indices

This is not very beautiful, but I think the concept is clear. It returns the indices of outer parenthesis, so you can just slice your string with these

Answer 2

A bit of an re hack, but this is possible:

>>> string  ='((clearance) AND (embedded) AND (software engineer OR developer)) AND (embedded)'
>>> [e for e in re.split(r'\((?=\()(.*?)(?<=\))\)|(?<!\()(\([^()]+\))(?!\))',string) if e and '(' in e and ')' in e]
['(clearance) AND (embedded) AND (software engineer OR developer)', '(embedded)']

Error in regular expression to find the text between parenthesis

Question

2 answers

solution1
3 ACCPTED 2018-12-12 16:10:26

solution2
1 2018-12-12 19:30:37

Error in regular expression to find the text between parenthesis

Question

2 answers

solution1 3 ACCPTED 2018-12-12 16:10:26

solution2 1 2018-12-12 19:30:37

solution1
3 ACCPTED 2018-12-12 16:10:26

solution2
1 2018-12-12 19:30:37