[英]Error in regular expression to find the text between parenthesis
我有一個弦
string ='((clearance) AND (embedded) AND (software engineer OR developer)) AND (embedded)'
我想根據括號細分為列表,因此請參考給定的解決方案
my_data = re.findall(r"(\(.*?\))",string)
但是當我打印my_data時,輸出為(len = 4)
['((clearance)', '(embedded)', '(software engineer OR developer)', '(embedded)']
但我想要的輸出是(len = 2)
['(clearance) AND (embedded) AND (software engineer OR developer)', '(embedded)']
因為“(清除)AND(嵌入式)AND(軟件工程師或開發人員)”在一個括號中,而“嵌入式”在另一個括號中。 但是“ re.findall”分為4個列表,為什么?
如果我想要我想要的輸出,如何修改正則表達式?
在純正則表達式中,這是不可能的,因此以下是一個帶有括號的想法:
def find_stuff(string):
indices = []
counter = 0
change = {"(":1, ")":-1}
for i, el in enumerate(string):
new_count = counter + change.get(el, 0)
if counter==0 and new_count==1:
indices.append(i)
elif counter==1 and new_count==0:
indices.append(i+1)
counter = new_count
return indices
這不是很漂亮,但我認為概念很明確。 它返回外部括號的索引,因此您可以使用以下內容對字符串進行切片
有點re
破解,但這是可能的:
>>> string ='((clearance) AND (embedded) AND (software engineer OR developer)) AND (embedded)'
>>> [e for e in re.split(r'\((?=\()(.*?)(?<=\))\)|(?<!\()(\([^()]+\))(?!\))',string) if e and '(' in e and ')' in e]
['(clearance) AND (embedded) AND (software engineer OR developer)', '(embedded)']
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.