正则表达式：由 + 分隔，除非在括号内

Question

I'm dealing with equations like 'x_{t+1}+y_{t}=z_{t-1}' .我正在处理像'x_{t+1}+y_{t}=z_{t-1}'这样的方程。 My objective is to obtain all "variables", that is, a list with x_{t+1}, y_{t}, z_{t-1} .我的目标是获取所有“变量”，即带有x_{t+1}, y_{t}, z_{t-1}的列表。

I'd like to split the string by [+-=*/] , but not if + or - are inside {} .我想用[+-=*/]分割字符串，但如果 + 或 - 在{}内则不行。

Something like this re.split('(?<!t)[\+\-\=]','x_{t+1}+y_{t}=z_{t-1}') partly does the job by not spliting if it observes t followed by a symbol.像这样的东西re.split('(?<!t)[\+\-\=]','x_{t+1}+y_{t}=z_{t-1}')部分地通过如果它观察到t后跟一个符号，则不会分裂。 But I'd like to be more general.但我想更笼统。 Assume there are no nested brackets.假设没有嵌套括号。

How can I do this?我怎样才能做到这一点？

Answer 1

Instead of splitting at those characters, you could find sequences of all other characters (like x and _ ) and bracket parts (like {t+1} ).您可以找到所有其他字符（如x和_ ）和括号部分（如{t+1} ）的序列，而不是拆分这些字符。 The first such sequence in the example is x , _ , {t+1} , ie, the substring x_{t+1} .示例中的第一个这样的序列是x 、 _ 、 {t+1} ，即子串x_{t+1} 。

import re

s = 'x_{t+1}+y_{t}=z_{t-1}'

print(re.findall(r'(?:\{.*?}|[^-+=*/])+', s))

Output ( Try it online! ):输出（在线尝试！）：

['x_{t+1}', 'y_{t}', 'z_{t-1}']

Answer 2

Instead of re.split , consider using re.findall to match only the variables:而不是re.split ，请考虑使用re.findall仅匹配变量：

>>> re.findall(r"[a-z0-9]+(?:_\{[^\}]+\})?","x_{t+1}+y_{t}=z_{t-1}+pi", re.IGNORECASE)
['x_{t+1}', 'y_{t}', 'z_{t-1}', 'pi']

Try online在线尝试

Explanation of regex:正则表达式的解释：

[a-z0-9]+(?:_\{[^\}]+\})?
[a-z0-9]+                : One or more alphanumeric characters
         (?:           )?: A non-capturing group, optional
            _\{      \}  : Underscore, and opening/closing brackets
               [^\}]+    : One or more non-close-bracket characters

正则表达式：由 + 分隔，除非在括号内

问题描述

2 个解决方案

解决方案1
3 2022-07-11 22:02:05

解决方案2
2 2022-07-11 22:06:27

正则表达式：由 + 分隔，除非在括号内

问题描述

2 个解决方案

解决方案1 3 2022-07-11 22:02:05

解决方案2 2 2022-07-11 22:06:27

解决方案1
3 2022-07-11 22:02:05

解决方案2
2 2022-07-11 22:06:27