简体   繁体   English

正则表达式:由 + 分隔,除非在括号内

[英]regex: split by + except if inside a brackets

I'm dealing with equations like 'x_{t+1}+y_{t}=z_{t-1}' .我正在处理像'x_{t+1}+y_{t}=z_{t-1}'这样的方程。 My objective is to obtain all "variables", that is, a list with x_{t+1}, y_{t}, z_{t-1} .我的目标是获取所有“变量”,即带有x_{t+1}, y_{t}, z_{t-1}的列表。

I'd like to split the string by [+-=*/] , but not if + or - are inside {} .我想用[+-=*/]分割字符串,但如果 + 或 - 在{}内则不行。

Something like this re.split('(?<!t)[\+\-\=]','x_{t+1}+y_{t}=z_{t-1}') partly does the job by not spliting if it observes t followed by a symbol.像这样的东西re.split('(?<!t)[\+\-\=]','x_{t+1}+y_{t}=z_{t-1}')部分地通过如果它观察到t后跟一个符号,则不会分裂。 But I'd like to be more general.但我想更笼统。 Assume there are no nested brackets.假设没有嵌套括号。

How can I do this?我怎样才能做到这一点?

Instead of splitting at those characters, you could find sequences of all other characters (like x and _ ) and bracket parts (like {t+1} ).您可以找到所有其他字符(如x_ )和括号部分(如{t+1} )的序列,而不是拆分这些字符。 The first such sequence in the example is x , _ , {t+1} , ie, the substring x_{t+1} .示例中的第一个这样的序列是x_{t+1} ,即子串x_{t+1}

import re

s = 'x_{t+1}+y_{t}=z_{t-1}'

print(re.findall(r'(?:\{.*?}|[^-+=*/])+', s))

Output ( Try it online! ):输出( 在线尝试! ):

['x_{t+1}', 'y_{t}', 'z_{t-1}']

Instead of re.split , consider using re.findall to match only the variables:而不是re.split ,请考虑使用re.findall仅匹配变量:

>>> re.findall(r"[a-z0-9]+(?:_\{[^\}]+\})?","x_{t+1}+y_{t}=z_{t-1}+pi", re.IGNORECASE)
['x_{t+1}', 'y_{t}', 'z_{t-1}', 'pi']

Try online在线尝试

Explanation of regex:正则表达式的解释:

[a-z0-9]+(?:_\{[^\}]+\})?
[a-z0-9]+                : One or more alphanumeric characters
         (?:           )?: A non-capturing group, optional
            _\{      \}  : Underscore, and opening/closing brackets
               [^\}]+    : One or more non-close-bracket characters

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM