简体   繁体   English

如何忽略括号中的嵌套括号之间的子字符串而提取括号之间的子字符串?

[英]How to extract substrings between brackets while ignoring those between nested brackets in Python?

I have a string: 我有一个字符串:

phy = '(s1:0.6507212936,((s2:0.4186036213,s3:0.4186036213):0.1428084058,((s4:0.1429514535,s5:0.1429514535):0.1695879844,s6:0.3125394379):0.2488725892):0.08930926654);'

How can I extract only the substrings that are enclosed between brackets and that do not contain any brackets within each substring? 如何仅提取括号之间的且每个子字符串中不包含任何括号的子字符串? So, from my example I require two outputs: "s2:0.4186036213,s3:0.4186036213" and "s4:0.1429514535,s5:0.1429514535". 因此,从我的示例中,我需要两个输出:“ s2:0.4186036213,s3:0.4186036213”和“ s4:0.1429514535,s5:0.1429514535”。

You can use regular rexpressions : 您可以使用常规的表达式

import re

phy = '(s1:0.6507212936,((s2:0.4186036213,s3:0.4186036213):0.1428084058,((s4:0.1429514535,s5:0.1429514535):0.1695879844,s6:0.3125394379):0.2488725892):0.08930926654);'
re.findall(r'\(([^\(\)]*)\)', phy)
# ['s2:0.4186036213,s3:0.4186036213', 's4:0.1429514535,s5:0.1429514535']

This captures everything non-brackety enclosed in opening-closing brackets. 这将捕获所有非括号括在开闭括号中的内容。 It does not, however, validate correct nesting levels. 但是,它不能验证正确的嵌套级别。

Try this: 尝试这个:

from collections import defaultdict
bracket_dict = defaultdict(int)
bracket_dict_ ={
    '(':')',
    '{':'}',
    '[':']'
}
bracket_dict.update(bracket_dict_)
bracket_list = bracket_dict.keys()

phy = '(s1:0.6507212936,((s2:0.4186036213,s3:0.4186036213):0.1428084058,((s4:0.1429514535,s5:0.1429514535):0.1695879844,s6:0.3125394379):0.2488725892):0.08930926654);'
inner_items=[]
brackets = []
start_index = None

for i in range(len(phy)):
    if phy[i] in bracket_list:
        start_index = i
        brackets.append(phy[i])

    if brackets:
        if phy[i] == bracket_dict[brackets[-1]]:
            inner_items.append(phy[start_index+1 : i])
            brackets.append(phy[i])
print(inner_items)
#['s2:0.4186036213,s3:0.4186036213', 's4:0.1429514535,s5:0.1429514535']

Use regex: 使用正则表达式:

import re

reg = re.compile(r'[(]([^()]+)[)]')

phy = '(s1:0.6507212936,((s2:0.4186036213,s3:0.4186036213):0.1428084058,((s4:0.1429514535,s5:0.1429514535):0.1695879844,s6:0.3125394379):0.2488725892):0.08930926654)'

print(reg.findall(phy))

Output : 输出:

C:\Users\Desktop>py x.py
['s2:0.4186036213,s3:0.4186036213', 's4:0.1429514535,s5:0.1429514535']

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM