简体   繁体   English

正则表达式:如何根据负前瞻和后视将子字符串放在括号内?

[英]Regex: how to put a substring within parenthesis according to negative lookahead & lookbehind?

my goal is to put a substring within parenthesis according to some specific rules.我的目标是根据某些特定规则将子字符串放在括号内。

For example, here is a text:例如,这里有一段文字:

text = 'cake OR ice cream'

And my goal is to transform this original text into this:我的目标是将这个原始文本转换成这样:

'cake OR (ice AND cream)'

As you can see, the ultimate goal is to preserve some Boolean logics within the text.如您所见,最终目标是在文本中保留一些布尔逻辑。

First step is to add the TO_PARENTHESIS that we will use as some sort of anchor.第一步是添加我们将用作某种锚点的TO_PARENTHESIS I can do it using negative lookbehind & lookahead:我可以使用负后视和前瞻来做到这一点:

import regex as re

text = re.sub(r'(?<!OR|AND|NOT|\(|\)|")(\s+)(?!OR|AND|NOT|\(|\)|")', r' TO_PARENTHESIS ', text)

This will find empty spaces & replace those with TO_PARENTHESIS but only the ones that are between two non-boolean keywords (and non-special characters).这将找到空格并用TO_PARENTHESIS替换它们,但仅替换两个非布尔关键字(和非特殊字符)之间的空格。

Here is what we get:这是我们得到的:

cake OR ice TO_PARENTHESIS cream

Now, my question is, how do I put parenthesis at the very specific points, to have something like this:现在,我的问题是,我如何将括号放在非常具体的点上,以得到如下内容:

cake OR (ice TO_PARENTHESIS cream)

I tried: (?<!OR|AND|NOT|\\(|\\)|")(.*TO_PARENTHESIS.*)(?!OR|AND|NOT|\\(|\\)|") but this will select the entire text & not just the ice TO_PARENTHESIS cream as expected.我试过: (?<!OR|AND|NOT|\\(|\\)|")(.*TO_PARENTHESIS.*)(?!OR|AND|NOT|\\(|\\)|")但这将选择整个文本,而不仅仅是像预期的那样ice TO_PARENTHESIS cream

So two questions:所以两个问题:

  • How to select the correct group?如何选择正确的组?
  • How to replace that selected group by itself plus the two parenthesis?如何单独替换所选组加上两个括号?

And last step would be to replace the TO_PARENTHESIS with AND to finally get our 'cake OR (ice AND cream)' .最后一步是用AND替换TO_PARENTHESIS以最终得到我们的'cake OR (ice AND cream)'

Maybe...也许...

import re

# starting string
text = 'cake OR ice cream'

# first pattern that finds the space between two lowercase letters (assuming it's always OR, AND, NOT, etc.)
pattern_1 = re.compile(r'([a-z])\s([a-z])') #replace with \1 AND \2

# capture the word (via a boundary) before the AND and the word after the AND
pattern_2 = re.compile(r'(\b\w+\b\sAND\s\b\w+)') #replace with (\1)

# show the starting text
print(text)

# make 'cake OR ice cream' into 'cake OR ice AND cream'
text = pattern_1.sub(r'\1 AND \2', text)

# make 'cake OR ice AND cream' into 'cake OR (ice AND cream)'
text = pattern_2.sub(r'(\1)', text)
print(text)

Input:输入:

cake OR ice cream

Output:输出:

cake OR (ice AND cream)

You can use a pattern to match TO_PARENTHESIS surrounded by a word, and then in the callback of re.sub place the full match between parenthesis and replace TO_PARENTHESIS by AND您可以使用模式来匹配被单词包围的TO_PARENTHESIS ,然后在re.sub的回调中将完整匹配放在括号之间并用AND替换 TO_PARENTHESIS

\w+(?:\s+TO_PARENTHESIS\s+\w+)+

The pattern matches:模式匹配:

  • \\w+ Match 1+ word characters \\w+匹配 1+ 个单词字符
  • (?: Non capture group (?:非捕获组
    • \\s+TO_PARENTHESIS Match whitespace chars and TO_PARENTHESIS \\s+TO_PARENTHESIS匹配空白字符和TO_PARENTHESIS
    • \\s+\\w+ Match whitespace chars and 1+ word chars \\s+\\w+匹配空白字符和 1+ 个单词字符
  • )+ Close non capture group and repeat 1 or more times for multiple matches )+关闭非捕获组并为多个匹配重复 1 次或多次

Regex demo正则表达式演示

import regex as re

text = 'cake OR ice cream please'
text = re.sub(r'(?<!OR|AND|NOT|\(|\)|")\s+(?!OR|AND|NOT|\(|\)|")', r' TO_PARENTHESIS ', text)
text = re.sub(
    r"\w+\s+TO_PARENTHESIS\s+\w+",
    lambda x: "(" + x.group().replace("TO_PARENTHESIS", "AND") + ")",
    text
)
print(text)

Output输出

cake OR (ice AND cream)

See a Python demo .请参阅Python 演示


If the input is如果输入是

cake OR ice cream please请蛋糕或冰淇淋

The output will be输出将是

cake OR (ice AND cream AND please)

If you want to do a single replacement only instead of multiple, you can shorten the pattern to:如果您只想进行一次替换而不是多次替换,则可以将模式缩短为:

\w+\s+TO_PARENTHESIS \w+

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM