[英]regex to match strings between two different separators like “<” and “>”
I want to split my text into contexts and captures where I have rules like: 我想将文本拆分为上下文并捕获具有以下规则的地方:
<abc> e f <ghi>
<abc> e f
e f <ghi>
Here I want to create rules that only affect strings inside of the markers <>
and example would be the output: 在这里,我想创建仅影响标记
<>
内的字符串的规则,示例将为输出:
<aabbcc> e f <gghhii>
<axyzbxyzcxyz> e f
e f <g_h_i_>
using line.split('')[i]
doesn't cut it because I have two different separators 使用
line.split('')[i]
不会剪切它,因为我有两个不同的分隔符
You can use re.sub
to replace the parts within <...>
, using a replacement callback function: 您可以使用
re.sub
通过替换回调函数替换<...>
的零件:
def replace_function(match):
return '<' + ''.join(c + c for c in match.group(1)) + '>'
text = re.sub(r"<(.*?)>", replace_function, text)
This will duplicate the chars in each of the tags, but you can extend the function any way you want to perform more complex substitutions. 这将在每个标签中复制字符,但是您可以以任何想要执行更复杂替换的方式扩展该功能。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.