簡體   English   中英

正則表達式:如何根據負前瞻和后視將子字符串放在括號內?

[英]Regex: how to put a substring within parenthesis according to negative lookahead & lookbehind?

我的目標是根據某些特定規則將子字符串放在括號內。

例如,這里有一段文字:

text = 'cake OR ice cream'

我的目標是將這個原始文本轉換成這樣:

'cake OR (ice AND cream)'

如您所見,最終目標是在文本中保留一些布爾邏輯。

第一步是添加我們將用作某種錨點的TO_PARENTHESIS 我可以使用負后視和前瞻來做到這一點:

import regex as re

text = re.sub(r'(?<!OR|AND|NOT|\(|\)|")(\s+)(?!OR|AND|NOT|\(|\)|")', r' TO_PARENTHESIS ', text)

這將找到空格並用TO_PARENTHESIS替換它們,但僅替換兩個非布爾關鍵字(和非特殊字符)之間的空格。

這是我們得到的:

cake OR ice TO_PARENTHESIS cream

現在,我的問題是,我如何將括號放在非常具體的點上,以得到如下內容:

cake OR (ice TO_PARENTHESIS cream)

我試過: (?<!OR|AND|NOT|\\(|\\)|")(.*TO_PARENTHESIS.*)(?!OR|AND|NOT|\\(|\\)|")但這將選擇整個文本,而不僅僅是像預期的那樣ice TO_PARENTHESIS cream

所以兩個問題:

  • 如何選擇正確的組?
  • 如何單獨替換所選組加上兩個括號?

最后一步是用AND替換TO_PARENTHESIS以最終得到我們的'cake OR (ice AND cream)'

也許...

import re

# starting string
text = 'cake OR ice cream'

# first pattern that finds the space between two lowercase letters (assuming it's always OR, AND, NOT, etc.)
pattern_1 = re.compile(r'([a-z])\s([a-z])') #replace with \1 AND \2

# capture the word (via a boundary) before the AND and the word after the AND
pattern_2 = re.compile(r'(\b\w+\b\sAND\s\b\w+)') #replace with (\1)

# show the starting text
print(text)

# make 'cake OR ice cream' into 'cake OR ice AND cream'
text = pattern_1.sub(r'\1 AND \2', text)

# make 'cake OR ice AND cream' into 'cake OR (ice AND cream)'
text = pattern_2.sub(r'(\1)', text)
print(text)

輸入:

cake OR ice cream

輸出:

cake OR (ice AND cream)

您可以使用模式來匹配被單詞包圍的TO_PARENTHESIS ,然后在re.sub的回調中將完整匹配放在括號之間並用AND替換 TO_PARENTHESIS

\w+(?:\s+TO_PARENTHESIS\s+\w+)+

模式匹配:

  • \\w+匹配 1+ 個單詞字符
  • (?:非捕獲組
    • \\s+TO_PARENTHESIS匹配空白字符和TO_PARENTHESIS
    • \\s+\\w+匹配空白字符和 1+ 個單詞字符
  • )+關閉非捕獲組並為多個匹配重復 1 次或多次

正則表達式演示

import regex as re

text = 'cake OR ice cream please'
text = re.sub(r'(?<!OR|AND|NOT|\(|\)|")\s+(?!OR|AND|NOT|\(|\)|")', r' TO_PARENTHESIS ', text)
text = re.sub(
    r"\w+\s+TO_PARENTHESIS\s+\w+",
    lambda x: "(" + x.group().replace("TO_PARENTHESIS", "AND") + ")",
    text
)
print(text)

輸出

cake OR (ice AND cream)

請參閱Python 演示


如果輸入是

請蛋糕或冰淇淋

輸出將是

cake OR (ice AND cream AND please)

如果您只想進行一次替換而不是多次替換,則可以將模式縮短為:

\w+\s+TO_PARENTHESIS \w+

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM