[英]Regex: how to put a substring within parenthesis according to negative lookahead & lookbehind?
my goal is to put a substring within parenthesis according to some specific rules.我的目标是根据某些特定规则将子字符串放在括号内。
For example, here is a text:例如,这里有一段文字:
text = 'cake OR ice cream'
And my goal is to transform this original text into this:我的目标是将这个原始文本转换成这样:
'cake OR (ice AND cream)'
As you can see, the ultimate goal is to preserve some Boolean logics within the text.如您所见,最终目标是在文本中保留一些布尔逻辑。
First step is to add the TO_PARENTHESIS
that we will use as some sort of anchor.第一步是添加我们将用作某种锚点的
TO_PARENTHESIS
。 I can do it using negative lookbehind & lookahead:我可以使用负后视和前瞻来做到这一点:
import regex as re
text = re.sub(r'(?<!OR|AND|NOT|\(|\)|")(\s+)(?!OR|AND|NOT|\(|\)|")', r' TO_PARENTHESIS ', text)
This will find empty spaces & replace those with TO_PARENTHESIS
but only the ones that are between two non-boolean keywords (and non-special characters).这将找到空格并用
TO_PARENTHESIS
替换它们,但仅替换两个非布尔关键字(和非特殊字符)之间的空格。
Here is what we get:这是我们得到的:
cake OR ice TO_PARENTHESIS cream
Now, my question is, how do I put parenthesis at the very specific points, to have something like this:现在,我的问题是,我如何将括号放在非常具体的点上,以得到如下内容:
cake OR (ice TO_PARENTHESIS cream)
I tried: (?<!OR|AND|NOT|\\(|\\)|")(.*TO_PARENTHESIS.*)(?!OR|AND|NOT|\\(|\\)|")
but this will select the entire text & not just the ice TO_PARENTHESIS cream
as expected.我试过:
(?<!OR|AND|NOT|\\(|\\)|")(.*TO_PARENTHESIS.*)(?!OR|AND|NOT|\\(|\\)|")
但这将选择整个文本,而不仅仅是像预期的那样ice TO_PARENTHESIS cream
。
So two questions:所以两个问题:
And last step would be to replace the TO_PARENTHESIS
with AND
to finally get our 'cake OR (ice AND cream)'
.最后一步是用
AND
替换TO_PARENTHESIS
以最终得到我们的'cake OR (ice AND cream)'
。
Maybe...也许...
import re
# starting string
text = 'cake OR ice cream'
# first pattern that finds the space between two lowercase letters (assuming it's always OR, AND, NOT, etc.)
pattern_1 = re.compile(r'([a-z])\s([a-z])') #replace with \1 AND \2
# capture the word (via a boundary) before the AND and the word after the AND
pattern_2 = re.compile(r'(\b\w+\b\sAND\s\b\w+)') #replace with (\1)
# show the starting text
print(text)
# make 'cake OR ice cream' into 'cake OR ice AND cream'
text = pattern_1.sub(r'\1 AND \2', text)
# make 'cake OR ice AND cream' into 'cake OR (ice AND cream)'
text = pattern_2.sub(r'(\1)', text)
print(text)
Input:输入:
cake OR ice cream
Output:输出:
cake OR (ice AND cream)
You can use a pattern to match TO_PARENTHESIS
surrounded by a word, and then in the callback of re.sub
place the full match between parenthesis and replace TO_PARENTHESIS by AND
您可以使用模式来匹配被单词包围的
TO_PARENTHESIS
,然后在re.sub
的回调中将完整匹配放在括号之间并用AND
替换 TO_PARENTHESIS
\w+(?:\s+TO_PARENTHESIS\s+\w+)+
The pattern matches:模式匹配:
\\w+
Match 1+ word characters \\w+
匹配 1+ 个单词字符(?:
Non capture group (?:
非捕获组
\\s+TO_PARENTHESIS
Match whitespace chars and TO_PARENTHESIS
\\s+TO_PARENTHESIS
匹配空白字符和TO_PARENTHESIS
\\s+\\w+
Match whitespace chars and 1+ word chars \\s+\\w+
匹配空白字符和 1+ 个单词字符)+
Close non capture group and repeat 1 or more times for multiple matches )+
关闭非捕获组并为多个匹配重复 1 次或多次import regex as re
text = 'cake OR ice cream please'
text = re.sub(r'(?<!OR|AND|NOT|\(|\)|")\s+(?!OR|AND|NOT|\(|\)|")', r' TO_PARENTHESIS ', text)
text = re.sub(
r"\w+\s+TO_PARENTHESIS\s+\w+",
lambda x: "(" + x.group().replace("TO_PARENTHESIS", "AND") + ")",
text
)
print(text)
Output输出
cake OR (ice AND cream)
See a Python demo .请参阅Python 演示。
If the input is如果输入是
cake OR ice cream please
请蛋糕或冰淇淋
The output will be输出将是
cake OR (ice AND cream AND please)
If you want to do a single replacement only instead of multiple, you can shorten the pattern to:如果您只想进行一次替换而不是多次替换,则可以将模式缩短为:
\w+\s+TO_PARENTHESIS \w+
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.