[英]How to replace a string with parts of it between markers (regex)?
我有一個文本,讓我們說:
Lorem ipsum dolor sit [[amet]], consectetur adipiscing elit,
sed do eiusmod [[time (sample)|tempor]] incididunt ut [[labore]] et dolore magna aliqua.
我想用tempor
替換[[time (sample)|tempor]]
。 結構始終相同: [[string to remove|string to extract]]
,並且可以在文本中出現多次。
我在正則表達式中嘗試了正則表達式,但沒有截斷一半文本就沒有成功: re.sub(r'\[.*?\|', '', text)
如何替換字符串?
您可以使用以下正則表達式僅收集相關字段
r'\[\[[\w\s\(\)]+?\|(.+?)\]\]'
import re
regex = r'\[\[[\w\s\(\)]+?\|(.+?)\]\]'
text = '''
Lorem ipsum dolor sit [[amet]], consectetur adipiscing elit,
sed do eiusmod [[time (sample)|tempor]] incididunt ut [[labore]] et dolore magna aliqua.
Lorem ipsum dolor sit [[amet]], consectetur adipiscing elit, sed do eiusmod [[time (sample)|tempor]] incididunt ut [[labore]] et dolore magna aliqua.
'''
txt = re.sub(regex, '[[\g<1>]]', text)
print(txt)
Lorem ipsum dolor sit [[amet]], consectetur adipiscing elit,
sed do eiusmod [[tempor]] incididunt ut [[labore]] et dolore magna aliqua.
Lorem ipsum dolor sit [[amet]], consectetur adipiscing elit, sed do eiusmod [[tempor]] incididunt ut [[labore]] et dolore magna aliqua.
Regex101 示例在這里
利用
\[\[(?:(?!\[\[)[^|])*\|(.*?)]]
根據要求替換為[[\1]]
或\1
。
見證明。
解釋
--------------------------------------------------------------------------------
\[ '['
--------------------------------------------------------------------------------
\[ '['
--------------------------------------------------------------------------------
(?: group, but do not capture (0 or more times
(matching the most amount possible)):
--------------------------------------------------------------------------------
(?! look ahead to see if there is not:
--------------------------------------------------------------------------------
\[ '['
--------------------------------------------------------------------------------
\[ '['
--------------------------------------------------------------------------------
) end of look-ahead
--------------------------------------------------------------------------------
[^|] any character except: '|'
--------------------------------------------------------------------------------
)* end of grouping
--------------------------------------------------------------------------------
\| '|'
--------------------------------------------------------------------------------
( group and capture to \1:
--------------------------------------------------------------------------------
.*? any character except \n (0 or more times
(matching the least amount possible))
--------------------------------------------------------------------------------
) end of \1
--------------------------------------------------------------------------------
]] ']]'
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.