从逻辑表达式中提取子字符串

Question

Let's say I have a string that looks like this: 假设我有一个看起来像这样的字符串：

myStr = '(Txt_l1 (Txt_l2)) or (Txt2_l1 (Txt2_l2))'

What I would like to obtain in the end would be: 我最终希望获得的是：

myStr_l1 = '(Txt_l1) or (Txt2_l1)'

and 和

myStr_l2 = '(Txt_l2) or (Txt2_l2)'

Some properties: 一些属性：

all "Txt_"-elements of the string start with an uppercase letter 字符串的所有“ Txt_”元素均以大写字母开头
the string can contain much more elements (so there could also be Txt3 , Txt4 ,...) 该字符串可以包含更多元素（因此也可能有Txt3 ， Txt4 ，...）
the suffixes '_l1' and '_l2' look different in reality; 实际上，后缀“ _l1”和“ _l2”看起来有所不同； they cannot be used for matching (I chose them for demonstration purposes) 它们不能用于匹配（我出于演示目的选择了它们）

I found a way to get the first part done by using: 我找到了一种使用以下方法完成第一部分的方法：

myStr_l1 = re.sub('\(\w+\)','',myStr)

which gives me 这给了我

'(Txt_l1 ) or (Txt2_l1 )'

However, I don't know how to obtain myStr_l2 . 但是，我不知道如何获取myStr_l2 。 My idea was to remove everything between two open parentheses. 我的想法是删除两个括号之间的所有内容。 But when I do something like this: 但是当我做这样的事情时：

re.sub('\(w+\(', '', myStr)

the entire string is returned. 返回整个字符串。

re.sub('\(.*\(', '', myStr)

removes - of course - far too much and gives me 删除-当然-太多了，给了我

'Txt2_l2))'

Does anyone have an idea how to get myStr_l2 ? 有谁知道如何获取myStr_l2吗？

When there is an "and" instead of an "or", the strings look slightly different: 当使用“和”而不是“或”时，字符串看起来略有不同：

myStr2 = '(Txt_l1 (Txt_l2) and Txt2_l1 (Txt2_l2))'

Then I can still use the command from above: 然后我仍然可以使用上面的命令：

re.sub('\(\w+\)','',myStr2)

which gives: 这使：

'(Txt_l1  and Txt2_l1 )'

but I again fail to get myStr2_l2 . 但是我再次无法获取myStr2_l2 。 How would I do this for these kind of strings? 我将如何处理此类字符串？

And how would one then do this for mixed expressions with "and" and "or" eg like this: 然后如何对带有“ and”和“ or”的混合表达式执行此操作，例如：

myStr3 = '(Txt_l1 (Txt_l2) and Txt2_l1 (Txt2_l2)) or  (Txt3_l1 (Txt3_l2) and Txt4_l1 (Txt2_l2))' 

re.sub('\(\w+\)','',myStr3)

gives me 给我

'(Txt_l1  and Txt2_l1 ) or  (Txt3_l1  and Txt4_l1 )'

but again: How would I obtain myStr3_l2 ? 但同样：我将如何获得myStr3_l2 ？

Answer 1

Regexp is not powerful enough for nested expressions (in your case: nested elements in parentheses). 对于嵌套表达式，正则表达式不够强大（在您的情况下：括号中的嵌套元素）。 You will have to write a parser. 您将必须编写一个解析器。 Look at https://pyparsing.wikispaces.com/ 看看https://pyparsing.wikispaces.com/

Answer 2

I'm not entirely sure what you want but I wrote this to strip everything between the parenthesis. 我不确定您要什么，但是我写了这条以去除括号之间的所有内容。

import re

mystr = '(Txt_l1 (Txt_l2)) or (Txt2_l1 (Txt2_l2))'
sets = mystr.split(' or ')
noParens = []
for line in sets:
    mat = re.match(r'\((.* )\((.*\)\))', line, re.M)
    if mat:
        noParens.append(mat.group(1))
        noParens.append(mat.group(2).replace(')',''))
print(noParens)

This takes all the parenthesis away and puts your elements in a list. 这消除了所有括号并将您的元素放在列表中。 Here's an alternate way of doing it without using Regular Expressions. 这是一种不使用正则表达式的替代方法。

mystr = '(Txt_l1 (Txt_l2)) or (Txt2_l1 (Txt2_l2))'
noParens = []

mystr = mystr.replace(' or ', ' ')
mystr = mystr.replace(')','')
mystr = mystr.replace('(','')
noParens = mystr.split()

print(noParens)

从逻辑表达式中提取子字符串

问题描述

2 个解决方案

解决方案1
0 2015-07-14 10:39:06

解决方案2
0 2015-07-14 13:08:53

从逻辑表达式中提取子字符串

问题描述

2 个解决方案

解决方案1 0 2015-07-14 10:39:06

解决方案2 0 2015-07-14 13:08:53

解决方案1
0 2015-07-14 10:39:06

解决方案2
0 2015-07-14 13:08:53