简体   繁体   中英

Extract substrings from logical expressions

Let's say I have a string that looks like this:

myStr = '(Txt_l1 (Txt_l2)) or (Txt2_l1 (Txt2_l2))'

What I would like to obtain in the end would be:

myStr_l1 = '(Txt_l1) or (Txt2_l1)'

and

myStr_l2 = '(Txt_l2) or (Txt2_l2)'

Some properties:

  • all "Txt_"-elements of the string start with an uppercase letter

  • the string can contain much more elements (so there could also be Txt3 , Txt4 ,...)

  • the suffixes '_l1' and '_l2' look different in reality; they cannot be used for matching (I chose them for demonstration purposes)

I found a way to get the first part done by using:

myStr_l1 = re.sub('\(\w+\)','',myStr)

which gives me

'(Txt_l1 ) or (Txt2_l1 )'

However, I don't know how to obtain myStr_l2 . My idea was to remove everything between two open parentheses. But when I do something like this:

re.sub('\(w+\(', '', myStr)

the entire string is returned.

re.sub('\(.*\(', '', myStr)

removes - of course - far too much and gives me

'Txt2_l2))'

Does anyone have an idea how to get myStr_l2 ?

When there is an "and" instead of an "or", the strings look slightly different:

myStr2 = '(Txt_l1 (Txt_l2) and Txt2_l1 (Txt2_l2))'

Then I can still use the command from above:

re.sub('\(\w+\)','',myStr2)

which gives:

'(Txt_l1  and Txt2_l1 )'

but I again fail to get myStr2_l2 . How would I do this for these kind of strings?

And how would one then do this for mixed expressions with "and" and "or" eg like this:

myStr3 = '(Txt_l1 (Txt_l2) and Txt2_l1 (Txt2_l2)) or  (Txt3_l1 (Txt3_l2) and Txt4_l1 (Txt2_l2))' 

re.sub('\(\w+\)','',myStr3)

gives me

'(Txt_l1  and Txt2_l1 ) or  (Txt3_l1  and Txt4_l1 )'

but again: How would I obtain myStr3_l2 ?

Regexp is not powerful enough for nested expressions (in your case: nested elements in parentheses). You will have to write a parser. Look at https://pyparsing.wikispaces.com/

I'm not entirely sure what you want but I wrote this to strip everything between the parenthesis.

import re

mystr = '(Txt_l1 (Txt_l2)) or (Txt2_l1 (Txt2_l2))'
sets = mystr.split(' or ')
noParens = []
for line in sets:
    mat = re.match(r'\((.* )\((.*\)\))', line, re.M)
    if mat:
        noParens.append(mat.group(1))
        noParens.append(mat.group(2).replace(')',''))
print(noParens)

This takes all the parenthesis away and puts your elements in a list. Here's an alternate way of doing it without using Regular Expressions.

mystr = '(Txt_l1 (Txt_l2)) or (Txt2_l1 (Txt2_l2))'
noParens = []

mystr = mystr.replace(' or ', ' ')
mystr = mystr.replace(')','')
mystr = mystr.replace('(','')
noParens = mystr.split()

print(noParens)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM