![](/img/trans.png)
[英]Extract string between two brackets, including nested brackets in python
[英]Extract string inside nested brackets
我需要從嵌套括號中提取字符串,如下所示:
[ this is [ hello [ who ] [what ] from the other side ] slim shady ]
結果(順序無關緊要) :
This is slim shady
Hello from the other side
Who
What
注意,字符串可以有 N 個括號,它們總是有效的,但可能嵌套也可能不嵌套。 此外,字符串不必以括號開頭。
我在網上找到的類似問題的解決方案建議使用正則表達式,但我不確定它是否適用於這種情況。
我正在考慮實現這個類似於我們如何檢查字符串是否具有所有有效括號:
穿過繩子。 如果我們看到一個 [ 我們將它的索引壓入堆棧,如果我們看到一個 ],我們從那里子串到當前位置。
但是,我們需要從原始字符串中刪除該子字符串,這樣我們就不會將它作為任何輸出的一部分。 因此,我不只是將索引推入堆棧,而是考慮在我們進行過程中創建一個 LinkedList,當我們找到一個 [ 時,我們將該節點插入到 LinkedList 中。 這將允許我們輕松地從 LinkedList 中刪除子字符串。
這是一個好方法還是有一個更清潔、已知的解決方案?
編輯:
'[ this is [ hello [ who ] [what ] from the other [side] ] slim shady ][oh my [g[a[w[d]]]]]'
應該返回(順序無關緊要) :
this is slim shady
hello from the other
who
what
side
oh my
g
a
w
d
空格無關緊要,之后刪除它很簡單。 重要的是能夠區分括號內的不同內容。 通過將它們分隔在新行中,或者有一個字符串列表。
此代碼按字符掃描文本,並在每次打開時將一個空list
壓入堆棧[
並在每次關閉時將最后一個壓入的list
彈出堆棧]
。
text = '[ this is [ hello [ who ] [what ] from the other side ] slim shady ]'
def parse(text):
stack = []
for char in text:
if char == '[':
#stack push
stack.append([])
elif char == ']':
yield ''.join(stack.pop())
else:
#stack peek
stack[-1].append(char)
print(tuple(parse(text)))
輸出;
(' who ', 'what ', ' hello from the other side ', ' this is slim shady ')
(' who ', 'what ', 'side', ' hello from the other ', ' this is slim shady ', 'd', 'w', 'a', 'g', 'oh my ')
這可以使用正則表達式輕松解決:
import re
s= '[ this is [ hello [ who ] [what ] from the other [side] ] slim shady ][oh my [g[a[w[d]]]]]'
result= []
pattern= r'\[([^[\]]*)\]' #regex pattern to find non-nested square brackets
while '[' in s: #while brackets remain
result.extend(re.findall(pattern, s)) #find them all and add them to the list
s= re.sub(pattern, '', s) #then remove them
result= filter(None, (t.strip() for t in result)) #strip whitespace and drop empty strings
#result: ['who', 'what', 'side', 'd', 'hello from the other', 'w', 'this is slim shady', 'a', 'g', 'oh my']
a = '[ this is [ hello [ who ] [what ] from the other side ] slim shady ]'
lvl = -1
words = []
for i in a:
if i == '[' :
lvl += 1
words.append('')
elif i == ']' :
lvl -= 1
else:
words[lvl] += i
for word in words:
print ' '.join(word.split())
這給出了 o/p -
這是苗條的陰影
另一邊的你好
誰什么
您可以使用樹狀結構來表示您的匹配項。
class BracketMatch:
def __init__(self, refstr, parent=None, start=-1, end=-1):
self.parent = parent
self.start = start
self.end = end
self.refstr = refstr
self.nested_matches = []
def __str__(self):
cur_index = self.start+1
result = ""
if self.start == -1 or self.end == -1:
return ""
for child_match in self.nested_matches:
if child_match.start != -1 and child_match.end != -1:
result += self.refstr[cur_index:child_match.start]
cur_index = child_match.end + 1
else:
continue
result += self.refstr[cur_index:self.end]
return result
# Main script
haystack = '''[ this is [ hello [ who ] [what ] from the other side ] slim shady ]'''
root = BracketMatch(haystack)
cur_match = root
for i in range(len(haystack)):
if '[' == haystack[i]:
new_match = BracketMatch(haystack, cur_match, i)
cur_match.nested_matches.append(new_match)
cur_match = new_match
elif ']' == haystack[i]:
cur_match.end = i
cur_match = cur_match.parent
else:
continue
# Here we built the set of matches, now we must print them
nodes_list = root.nested_matches
# So we conduct a BFS to visit and print each match...
while nodes_list != []:
node = nodes_list.pop(0)
nodes_list.extend(node.nested_matches)
print("Match: " + str(node).strip())
該程序的輸出將是:
匹配:這是苗條的陰影
匹配:來自對方的你好
匹配:誰
匹配:什么
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.