[英]Regex: how to put a substring within parenthesis according to negative lookahead & lookbehind?
我的目標是根據某些特定規則將子字符串放在括號內。
例如,這里有一段文字:
text = 'cake OR ice cream'
我的目標是將這個原始文本轉換成這樣:
'cake OR (ice AND cream)'
如您所見,最終目標是在文本中保留一些布爾邏輯。
第一步是添加我們將用作某種錨點的TO_PARENTHESIS
。 我可以使用負后視和前瞻來做到這一點:
import regex as re
text = re.sub(r'(?<!OR|AND|NOT|\(|\)|")(\s+)(?!OR|AND|NOT|\(|\)|")', r' TO_PARENTHESIS ', text)
這將找到空格並用TO_PARENTHESIS
替換它們,但僅替換兩個非布爾關鍵字(和非特殊字符)之間的空格。
這是我們得到的:
cake OR ice TO_PARENTHESIS cream
現在,我的問題是,我如何將括號放在非常具體的點上,以得到如下內容:
cake OR (ice TO_PARENTHESIS cream)
我試過: (?<!OR|AND|NOT|\\(|\\)|")(.*TO_PARENTHESIS.*)(?!OR|AND|NOT|\\(|\\)|")
但這將選擇整個文本,而不僅僅是像預期的那樣ice TO_PARENTHESIS cream
。
所以兩個問題:
最后一步是用AND
替換TO_PARENTHESIS
以最終得到我們的'cake OR (ice AND cream)'
。
也許...
import re
# starting string
text = 'cake OR ice cream'
# first pattern that finds the space between two lowercase letters (assuming it's always OR, AND, NOT, etc.)
pattern_1 = re.compile(r'([a-z])\s([a-z])') #replace with \1 AND \2
# capture the word (via a boundary) before the AND and the word after the AND
pattern_2 = re.compile(r'(\b\w+\b\sAND\s\b\w+)') #replace with (\1)
# show the starting text
print(text)
# make 'cake OR ice cream' into 'cake OR ice AND cream'
text = pattern_1.sub(r'\1 AND \2', text)
# make 'cake OR ice AND cream' into 'cake OR (ice AND cream)'
text = pattern_2.sub(r'(\1)', text)
print(text)
輸入:
cake OR ice cream
輸出:
cake OR (ice AND cream)
您可以使用模式來匹配被單詞包圍的TO_PARENTHESIS
,然后在re.sub
的回調中將完整匹配放在括號之間並用AND
替換 TO_PARENTHESIS
\w+(?:\s+TO_PARENTHESIS\s+\w+)+
模式匹配:
\\w+
匹配 1+ 個單詞字符(?:
非捕獲組
\\s+TO_PARENTHESIS
匹配空白字符和TO_PARENTHESIS
\\s+\\w+
匹配空白字符和 1+ 個單詞字符)+
關閉非捕獲組並為多個匹配重復 1 次或多次import regex as re
text = 'cake OR ice cream please'
text = re.sub(r'(?<!OR|AND|NOT|\(|\)|")\s+(?!OR|AND|NOT|\(|\)|")', r' TO_PARENTHESIS ', text)
text = re.sub(
r"\w+\s+TO_PARENTHESIS\s+\w+",
lambda x: "(" + x.group().replace("TO_PARENTHESIS", "AND") + ")",
text
)
print(text)
輸出
cake OR (ice AND cream)
請參閱Python 演示。
如果輸入是
請蛋糕或冰淇淋
輸出將是
cake OR (ice AND cream AND please)
如果您只想進行一次替換而不是多次替換,則可以將模式縮短為:
\w+\s+TO_PARENTHESIS \w+
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.