Find all occurrences of regex pattern, but ignore occurrences that contain another pattern

Question

I have a block of text that I'm trying to parse:

「<%sM_item2><%sM_plusnum2>の|　<%sM_slot>の部分を|　<%sM_change_color>に　カラーリングするのですね？|<br>|「それでは　<%sM_item>が　１０本と|　<%nM_gold>ゴールドが必要ですが　よろしいですか？|<yesno><close>

In this block of text, I'm trying to regex split on all occurrences of <???> , EXCEPT for when it matches on <%???> .

I have it mostly working with this:

re.split(r'<((?!%).+?)>', source_text)

['「<%sM_item2><%sM_plusnum2>の|\u3000<%sM_slot>の部分を|\u3000<%sM_change_color>に\u3000カラーリングするのですね？|', 'br', '|「それでは\u3000<%sM_item>が\u3000１０
本と|\u3000<%nM_gold>ゴールドが必要ですが\u3000よろしいですか？|', 'yesno', '', 'close', '']

My problem is although it kept the <%???> tags in place, it somehow stripped the <> characters from the matches (notice 'yesno', 'close', and 'br' tags no longer have those characters).

Answer 1

Based on the documentation of re.split :

Split string by the occurrences of pattern. If capturing parentheses are used 
in pattern, then the text of all groups in the pattern are also returned as 
part of the resulting list.

In this case, my parentheses needs to be placed on the outside of the match to preserve the () .

re.split('(<(?!%).+?>)', source_text)
['「<%sM_item2><%sM_plusnum2>の|\u3000<%sM_slot>の部分を|\u3000<%sM_change_color>に\u3000カラーリングするのですね？|', '<br>', '|「それでは\u3000<%sM_item>が\u3000１０本と|\u3000<%nM_gold>ゴールドが必要ですが\u3000よろしいですか？|', '<yesno>', '', '<close>', '']

Find all occurrences of regex pattern, but ignore occurrences that contain another pattern

Question

1 answers

solution1
0 2021-11-22 03:42:25

Find all occurrences of regex pattern, but ignore occurrences that contain another pattern

Question

1 answers

solution1 0 2021-11-22 03:42:25

solution1
0 2021-11-22 03:42:25