I have a text, let's say:
Lorem ipsum dolor sit [[amet]], consectetur adipiscing elit,
sed do eiusmod [[time (sample)|tempor]] incididunt ut [[labore]] et dolore magna aliqua.
I want to replace the [[time (sample)|tempor]]
with tempor
. The structure is always the same: [[string to remove|string to extract]]
and can appear several times in the text.
I tried regular expressions in regex, but I wasn't successful without cutting off half of the text: re.sub(r'\[.*?\|', '', text)
How can I replace the string?
You may use the following regex to collect only the relevant field
r'\[\[[\w\s\(\)]+?\|(.+?)\]\]'
import re
regex = r'\[\[[\w\s\(\)]+?\|(.+?)\]\]'
text = '''
Lorem ipsum dolor sit [[amet]], consectetur adipiscing elit,
sed do eiusmod [[time (sample)|tempor]] incididunt ut [[labore]] et dolore magna aliqua.
Lorem ipsum dolor sit [[amet]], consectetur adipiscing elit, sed do eiusmod [[time (sample)|tempor]] incididunt ut [[labore]] et dolore magna aliqua.
'''
txt = re.sub(regex, '[[\g<1>]]', text)
print(txt)
Lorem ipsum dolor sit [[amet]], consectetur adipiscing elit,
sed do eiusmod [[tempor]] incididunt ut [[labore]] et dolore magna aliqua.
Lorem ipsum dolor sit [[amet]], consectetur adipiscing elit, sed do eiusmod [[tempor]] incididunt ut [[labore]] et dolore magna aliqua.
Regex101 sample here
Use
\[\[(?:(?!\[\[)[^|])*\|(.*?)]]
Replace with [[\1]]
or \1
, depending on the requirements.
See proof .
EXPLANATION
--------------------------------------------------------------------------------
\[ '['
--------------------------------------------------------------------------------
\[ '['
--------------------------------------------------------------------------------
(?: group, but do not capture (0 or more times
(matching the most amount possible)):
--------------------------------------------------------------------------------
(?! look ahead to see if there is not:
--------------------------------------------------------------------------------
\[ '['
--------------------------------------------------------------------------------
\[ '['
--------------------------------------------------------------------------------
) end of look-ahead
--------------------------------------------------------------------------------
[^|] any character except: '|'
--------------------------------------------------------------------------------
)* end of grouping
--------------------------------------------------------------------------------
\| '|'
--------------------------------------------------------------------------------
( group and capture to \1:
--------------------------------------------------------------------------------
.*? any character except \n (0 or more times
(matching the least amount possible))
--------------------------------------------------------------------------------
) end of \1
--------------------------------------------------------------------------------
]] ']]'
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.