[英]Find all occurrences of regex pattern, but ignore occurrences that contain another pattern
I have a block of text that I'm trying to parse:我有一段文本要解析:
「<%sM_item2><%sM_plusnum2>の| <%sM_slot>の部分を| <%sM_change_color>に カラーリングするのですね?|<br>|「それでは <%sM_item>が 10本と| <%nM_gold>ゴールドが必要ですが よろしいですか?|<yesno><close>
In this block of text, I'm trying to regex split on all occurrences of <???>
, EXCEPT for when it matches on <%???>
.在这个文本块中,我试图对所有出现的<???>
进行正则表达式拆分,除了当它在<%???>
上匹配时。
I have it mostly working with this:我主要使用它:
re.split(r'<((?!%).+?)>', source_text)
['「<%sM_item2><%sM_plusnum2>の|\u3000<%sM_slot>の部分を|\u3000<%sM_change_color>に\u3000カラーリングするのですね?|', 'br', '|「それでは\u3000<%sM_item>が\u300010
本と|\u3000<%nM_gold>ゴールドが必要ですが\u3000よろしいですか?|', 'yesno', '', 'close', '']
My problem is although it kept the <%???>
tags in place, it somehow stripped the <>
characters from the matches (notice 'yesno', 'close', and 'br' tags no longer have those characters).我的问题是虽然它保留了<%???>
标签,但它以某种方式从匹配中剥离了<>
字符(注意“yesno”、“close”和“br”标签不再有这些字符)。
Based on the documentation of re.split
:基于re.split
的文档:
Split string by the occurrences of pattern. If capturing parentheses are used
in pattern, then the text of all groups in the pattern are also returned as
part of the resulting list.
In this case, my parentheses needs to be placed on the outside of the match to preserve the ()
.在这种情况下,我的括号需要放在匹配的外部以保留()
。
re.split('(<(?!%).+?>)', source_text)
['「<%sM_item2><%sM_plusnum2>の|\u3000<%sM_slot>の部分を|\u3000<%sM_change_color>に\u3000カラーリングするのですね?|', '<br>', '|「それでは\u3000<%sM_item>が\u300010本と|\u3000<%nM_gold>ゴールドが必要ですが\u3000よろしいですか?|', '<yesno>', '', '<close>', '']
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.