简体   繁体   English

查找所有出现的正则表达式模式,但忽略包含另一个模式的出现

[英]Find all occurrences of regex pattern, but ignore occurrences that contain another pattern

I have a block of text that I'm trying to parse:我有一段文本要解析:

「<%sM_item2><%sM_plusnum2>の| <%sM_slot>の部分を| <%sM_change_color>に カラーリングするのですね?|<br>|「それでは <%sM_item>が 10本と| <%nM_gold>ゴールドが必要ですが よろしいですか?|<yesno><close>

In this block of text, I'm trying to regex split on all occurrences of <???> , EXCEPT for when it matches on <%???> .在这个文本块中,我试图对所有出现的<???>进行正则表达式拆分,除了当它在<%???>上匹配时。

I have it mostly working with this:我主要使用它:

re.split(r'<((?!%).+?)>', source_text)

['「<%sM_item2><%sM_plusnum2>の|\u3000<%sM_slot>の部分を|\u3000<%sM_change_color>に\u3000カラーリングするのですね?|', 'br', '|「それでは\u3000<%sM_item>が\u300010
本と|\u3000<%nM_gold>ゴールドが必要ですが\u3000よろしいですか?|', 'yesno', '', 'close', '']

My problem is although it kept the <%???> tags in place, it somehow stripped the <> characters from the matches (notice 'yesno', 'close', and 'br' tags no longer have those characters).我的问题是虽然它保留了<%???>标签,但它以某种方式从匹配中剥离了<>字符(注意“yesno”、“close”和“br”标签不再有这些字符)。

Based on the documentation of re.split :基于re.split文档

Split string by the occurrences of pattern. If capturing parentheses are used 
in pattern, then the text of all groups in the pattern are also returned as 
part of the resulting list.

In this case, my parentheses needs to be placed on the outside of the match to preserve the () .在这种情况下,我的括号需要放在匹配的外部以保留()

re.split('(<(?!%).+?>)', source_text)
['「<%sM_item2><%sM_plusnum2>の|\u3000<%sM_slot>の部分を|\u3000<%sM_change_color>に\u3000カラーリングするのですね?|', '<br>', '|「それでは\u3000<%sM_item>が\u300010本と|\u3000<%nM_gold>ゴールドが必要ですが\u3000よろしいですか?|', '<yesno>', '', '<close>', '']
 

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 Python正则表达式匹配所有出现的十进制模式,后跟另一个模式 - Python Regex match all occurrences of decimal pattern followed by another pattern 查找所有出现的正则表达式模式并替换为eval输出 - Find all occurrences of a regex pattern and replace with eval output 如何在Python中查找所有出现的模式及其索引 - How to find all occurrences of a pattern and their indices in Python 使用Python正则表达式替换字符串中所有出现的“模式” - Replace all occurrences of 'pattern' in part of a string using Python regex 正则表达式获取所有出现的模式,后跟逗号分隔字符串中的值 - Regex to get all occurrences of a pattern followed by a value in a comma separate string 在非默认模式中查找 2 个值之间的所有匹配项 - find all occurrences between 2 values in non default pattern 正则表达式:匹配模式且出现的字符串最少 - RegEx: match pattern with minimum occurrences of a string Pandas:使用 iloc 计算所有出现的模式 - Pandas: Using iloc to count all occurrences of pattern 从匹配的模式中获取所有出现并拆分 - get all occurrences from a matched pattern and split 使用 Regex 查找特定字符串的所有出现并将其与另一个字符串的后续出现配对 - Use Regex to find all occurrences of a specific string and pair it with subsequent occurrences of another string
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM