[英]Find all delimiters next to substring in string and replace in python
示例字符串:
s = "<sec>John</sec> said hi to a woman (named <sec>Mary)</sec>"
我需要將其轉換為:
s = "<sec>John</sec> said hi to a woman (named <sec>Mary</sec>)"
這將需要同時在開始和結束標記上以及所有分隔符(例如“。”,“,”,“-”,“(”,“)”)上進行工作。
我可以進行搜索並替換為“)”,依此類推,但是顯然我想要更性感的東西。
因此,基本上將所有定界符移到標記之外。
謝謝!
下面的正則表達式將幫助您將在開始和結束標記中存在的定界符移動到結束標記的下一個。
(<sec>)([^.,()-]*)([.,()-])(<\/sec>)
替換字符串:
\1\2\4\3
>>> s = "<sec>John</sec> said hi to a woman (named <sec>Mary)</sec>"
>>> re.sub(r'(<sec>)([^.,()-]*)([.,()-])(<\/sec>)', r'\1\2\4\3', s)
'<sec>John</sec> said hi to a woman (named <sec>Mary</sec>)'
要么
這適用於任何標簽,
>>> s = "<sec>John</sec> said hi to a woman (named <sec>Mary)</sec>"
>>> re.sub(r'(<(\S+?\b)[^>]*>)([^.,()-]*)([.,()-])(<\/\2>)', r'\1\3\5\4', s)
'<sec>John</sec> said hi to a woman (named <sec>Mary</sec>)'
另一個正則表達式變體:
>>> s = "Nicely<sec>, John</sec> said hi to a woman (named <sec>Mary)</sec>"
>>> re.sub(r'((?:<[^>]+>)?)( *[-.(),]+ *)((?:</[^>]+>)?)',r'\3\2\1',s)
# ^^ ^^
# move spaces with the punctuation
# remove that if not needed
'Nicely, <sec>John</sec> said hi to a woman (named <sec>Mary</sec>)'
這個想法是交換開始標簽↔標點符號或標點符號↔關閉標簽。
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.