[英]Extract substrings from logical expressions
Let's say I have a string that looks like this: 假设我有一个看起来像这样的字符串:
myStr = '(Txt_l1 (Txt_l2)) or (Txt2_l1 (Txt2_l2))'
What I would like to obtain in the end would be: 我最终希望获得的是:
myStr_l1 = '(Txt_l1) or (Txt2_l1)'
and 和
myStr_l2 = '(Txt_l2) or (Txt2_l2)'
Some properties: 一些属性:
all "Txt_"-elements of the string start with an uppercase letter 字符串的所有“ Txt_”元素均以大写字母开头
the string can contain much more elements (so there could also be Txt3
, Txt4
,...) 该字符串可以包含更多元素(因此也可能有Txt3
, Txt4
,...)
the suffixes '_l1' and '_l2' look different in reality; 实际上,后缀“ _l1”和“ _l2”看起来有所不同; they cannot be used for matching (I chose them for demonstration purposes) 它们不能用于匹配(我出于演示目的选择了它们)
I found a way to get the first part done by using: 我找到了一种使用以下方法完成第一部分的方法:
myStr_l1 = re.sub('\(\w+\)','',myStr)
which gives me 这给了我
'(Txt_l1 ) or (Txt2_l1 )'
However, I don't know how to obtain myStr_l2
. 但是,我不知道如何获取myStr_l2
。 My idea was to remove everything between two open parentheses. 我的想法是删除两个括号之间的所有内容。 But when I do something like this: 但是当我做这样的事情时:
re.sub('\(w+\(', '', myStr)
the entire string is returned. 返回整个字符串。
re.sub('\(.*\(', '', myStr)
removes - of course - far too much and gives me 删除-当然-太多了,给了我
'Txt2_l2))'
Does anyone have an idea how to get myStr_l2
? 有谁知道如何获取myStr_l2
吗?
When there is an "and" instead of an "or", the strings look slightly different: 当使用“和”而不是“或”时,字符串看起来略有不同:
myStr2 = '(Txt_l1 (Txt_l2) and Txt2_l1 (Txt2_l2))'
Then I can still use the command from above: 然后我仍然可以使用上面的命令:
re.sub('\(\w+\)','',myStr2)
which gives: 这使:
'(Txt_l1 and Txt2_l1 )'
but I again fail to get myStr2_l2
. 但是我再次无法获取myStr2_l2
。 How would I do this for these kind of strings? 我将如何处理此类字符串?
And how would one then do this for mixed expressions with "and" and "or" eg like this: 然后如何对带有“ and”和“ or”的混合表达式执行此操作,例如:
myStr3 = '(Txt_l1 (Txt_l2) and Txt2_l1 (Txt2_l2)) or (Txt3_l1 (Txt3_l2) and Txt4_l1 (Txt2_l2))'
re.sub('\(\w+\)','',myStr3)
gives me 给我
'(Txt_l1 and Txt2_l1 ) or (Txt3_l1 and Txt4_l1 )'
but again: How would I obtain myStr3_l2
? 但同样:我将如何获得myStr3_l2
?
Regexp is not powerful enough for nested expressions (in your case: nested elements in parentheses). 对于嵌套表达式,正则表达式不够强大(在您的情况下:括号中的嵌套元素)。 You will have to write a parser. 您将必须编写一个解析器。 Look at https://pyparsing.wikispaces.com/ 看看https://pyparsing.wikispaces.com/
I'm not entirely sure what you want but I wrote this to strip everything between the parenthesis. 我不确定您要什么,但是我写了这条以去除括号之间的所有内容。
import re
mystr = '(Txt_l1 (Txt_l2)) or (Txt2_l1 (Txt2_l2))'
sets = mystr.split(' or ')
noParens = []
for line in sets:
mat = re.match(r'\((.* )\((.*\)\))', line, re.M)
if mat:
noParens.append(mat.group(1))
noParens.append(mat.group(2).replace(')',''))
print(noParens)
This takes all the parenthesis away and puts your elements in a list. 这消除了所有括号并将您的元素放在列表中。 Here's an alternate way of doing it without using Regular Expressions. 这是一种不使用正则表达式的替代方法。
mystr = '(Txt_l1 (Txt_l2)) or (Txt2_l1 (Txt2_l2))'
noParens = []
mystr = mystr.replace(' or ', ' ')
mystr = mystr.replace(')','')
mystr = mystr.replace('(','')
noParens = mystr.split()
print(noParens)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.