简体   繁体   English

正则表达式以匹配两个不同分隔符(例如“ <”和“>”)之间的字符串

[英]regex to match strings between two different separators like “<” and “>”

I want to split my text into contexts and captures where I have rules like: 我想将文本拆分为上下文并捕获具有以下规则的地方:

<abc> e f <ghi>
<abc> e f
e f <ghi>

Here I want to create rules that only affect strings inside of the markers <> and example would be the output: 在这里,我想创建仅影响标记<>内的字符串的规则,示例将为输出:

<aabbcc> e f <gghhii>
<axyzbxyzcxyz> e f 
e f <g_h_i_>

using line.split('')[i] doesn't cut it because I have two different separators 使用line.split('')[i]不会剪切它,因为我有两个不同的分隔符

You can use re.sub to replace the parts within <...> , using a replacement callback function: 您可以使用re.sub通过替换回调函数替换<...>的零件:

def replace_function(match):
    return '<' + ''.join(c + c for c in match.group(1)) + '>'

text = re.sub(r"<(.*?)>", replace_function, text)

This will duplicate the chars in each of the tags, but you can extend the function any way you want to perform more complex substitutions. 这将在每个标签中复制字符,但是您可以以任何想要执行更复杂替换的方式扩展该功能。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM