[英]regex: split by + except if inside a brackets
I'm dealing with equations like 'x_{t+1}+y_{t}=z_{t-1}'
.我正在处理像
'x_{t+1}+y_{t}=z_{t-1}'
这样的方程。 My objective is to obtain all "variables", that is, a list with x_{t+1}, y_{t}, z_{t-1}
.我的目标是获取所有“变量”,即带有
x_{t+1}, y_{t}, z_{t-1}
的列表。
I'd like to split the string by [+-=*/]
, but not if + or - are inside {}
.我想用
[+-=*/]
分割字符串,但如果 + 或 - 在{}
内则不行。
Something like this re.split('(?<!t)[\+\-\=]','x_{t+1}+y_{t}=z_{t-1}')
partly does the job by not spliting if it observes t
followed by a symbol.像这样的东西
re.split('(?<!t)[\+\-\=]','x_{t+1}+y_{t}=z_{t-1}')
部分地通过如果它观察到t
后跟一个符号,则不会分裂。 But I'd like to be more general.但我想更笼统。 Assume there are no nested brackets.
假设没有嵌套括号。
How can I do this?我怎样才能做到这一点?
Instead of splitting at those characters, you could find sequences of all other characters (like x
and _
) and bracket parts (like {t+1}
).您可以找到所有其他字符(如
x
和_
)和括号部分(如{t+1}
)的序列,而不是拆分这些字符。 The first such sequence in the example is x
, _
, {t+1}
, ie, the substring x_{t+1}
.示例中的第一个这样的序列是
x
、 _
、 {t+1}
,即子串x_{t+1}
。
import re
s = 'x_{t+1}+y_{t}=z_{t-1}'
print(re.findall(r'(?:\{.*?}|[^-+=*/])+', s))
Output ( Try it online! ):输出( 在线尝试! ):
['x_{t+1}', 'y_{t}', 'z_{t-1}']
Instead of re.split
, consider using re.findall
to match only the variables:而不是
re.split
,请考虑使用re.findall
仅匹配变量:
>>> re.findall(r"[a-z0-9]+(?:_\{[^\}]+\})?","x_{t+1}+y_{t}=z_{t-1}+pi", re.IGNORECASE)
['x_{t+1}', 'y_{t}', 'z_{t-1}', 'pi']
Explanation of regex:正则表达式的解释:
[a-z0-9]+(?:_\{[^\}]+\})?
[a-z0-9]+ : One or more alphanumeric characters
(?: )?: A non-capturing group, optional
_\{ \} : Underscore, and opening/closing brackets
[^\}]+ : One or more non-close-bracket characters
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.