简体   繁体   English

在字符串中的子字符串旁边找到所有定界符,并在python中替换

[英]Find all delimiters next to substring in string and replace in python

Sample string: 示例字符串:

s = "<sec>John</sec> said hi to a woman (named <sec>Mary)</sec>"

I need to transform this to: 我需要将其转换为:

s = "<sec>John</sec> said hi to a woman (named <sec>Mary</sec>)"

This would need to work on both the start and end tag and for all delimiters like ".", ",", "-", "(", ")" and so on. 这将需要同时在开始和结束标记上以及所有分隔符(例如“。”,“,”,“-”,“(”,“)”)上进行工作。

I could just do a search and replace for ")" and so on, but obviously I would like something a bit more sexy. 我可以进行搜索并替换为“)”,依此类推,但是显然我想要更性感的东西。

So basically move all delimiters outside the tag. 因此,基本上将所有定界符移到标记之外。

Thanks! 谢谢!

The below regex would help you to move the delimiter which is present inside the opening and closing tag to the next of the closing tag. 下面的正则表达式将帮助您将在开始和结束标记中存在的定界符移动到结束标记的下一个。

(<sec>)([^.,()-]*)([.,()-])(<\/sec>)

REplacement string: 替换字符串:

\1\2\4\3

DEMO 演示

>>> s = "<sec>John</sec> said hi to a woman (named <sec>Mary)</sec>"
>>> re.sub(r'(<sec>)([^.,()-]*)([.,()-])(<\/sec>)', r'\1\2\4\3', s)
'<sec>John</sec> said hi to a woman (named <sec>Mary</sec>)'

OR 要么

This would work for any tags, 这适用于任何标签,

>>> s = "<sec>John</sec> said hi to a woman (named <sec>Mary)</sec>"
>>> re.sub(r'(<(\S+?\b)[^>]*>)([^.,()-]*)([.,()-])(<\/\2>)', r'\1\3\5\4', s)
'<sec>John</sec> said hi to a woman (named <sec>Mary</sec>)'

An other regex variation: 另一个正则表达式变体:

>>> s = "Nicely<sec>, John</sec> said hi to a woman (named <sec>Mary)</sec>"
>>> re.sub(r'((?:<[^>]+>)?)( *[-.(),]+ *)((?:</[^>]+>)?)',r'\3\2\1',s)
#                           ^^        ^^
#                  move spaces with the punctuation
#                     remove that if not needed

'Nicely, <sec>John</sec> said hi to a woman (named <sec>Mary</sec>)'

The idea is to swap opening tags ↔ punctuation or punctuation ↔ closing tag. 这个想法是交换开始标签↔标点符号标点符号↔关闭标签。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM