[英]Python - Regex - match characters between certain characters
我有一個文本文件,我想匹配/查找/解析某些字符之間的所有字符( [\\ n“ 文本以匹配 ” \\ n] )。 文本本身在結構和字符方面可能有很大不同(它們可以包含所有可能的字符)。
我之前曾發布過此問題(對不起,重復問題),但是到目前為止,該問題無法解決,因此,我現在想更加精確地解決該問題。
文件中的文本是這樣構建的:
test ="""
[
"this is a text and its supposed to contain every possible char."
],
[
"like *.;#]§< and many "" more."
],
[
"plus there are even
newlines
in it."
]"""
我想要的輸出應該是一個列表(例如),分隔符之間的每個文本都作為元素,如下所示:
['this is a text and its supposed to contain every possible char.', 'like *.;#]§< and many "" more.', 'plus there are even newlines in it.']
我試圖用正則表達式解決這個問題,並用我想出的相應輸出提供了兩個解決方案:
my_list = re.findall(r'(?<=\[\n {8}\").*(?=\"\n {8}\])', test)
print (my_list)
['this is a text and its supposed to contain every possible char.', 'like *.;#]§< and many "" more.']
好吧,這個很近。 它列出了前兩個元素,但不幸的是沒有列出第三個元素,因為其中包含換行符。
my_list = re.findall(r'(?<=\[\n {8}\")[\s\S]*(?=\"\n {8}\])', test)
print (my_list)
['this is a text and its supposed to contain every possible char."\n ], \n [\n "like *.;#]§< and many "" more."\n ], \n [\n "plus there are even\nnewlines\n \n in it.']
好的,這次包括了每個元素,但是列表中只有一個元素,並且超前工作似乎並沒有像我想的那樣。
那么正確的正則表達式可以用來獲得我想要的輸出呢? 為什么第二種方法不包括前瞻性?
還是有一種更干凈,更快捷的方式來獲得我想要的東西(美麗湯或其他方法?)?
非常感謝您的幫助和提示。
我正在使用python 3.6。
您應該使用DOTALL
標志來匹配換行符
print(re.findall(r'\[\n\s+"(.*?)"\n\s+\]', test, re.DOTALL))
產量
['this is a text and its supposed to contain every possible char.', 'like *.;#]§< and many "" more.', 'plus there are even\nnewlines\n\nin it.']
您可以使用模式
(?s)\[[^"]*"(.*?)"[^]"]*\]
捕捉中的每一個元素"
S中的括號內:
https://regex101.com/r/SguEAU/1
然后,您可以使用帶有re.sub
的列表re.sub
,用單個普通空格替換每個捕獲的子字符串中的空格字符(包括換行符):
test ="""
[
"this is a text and its supposed to contain every possible char."
],
[
"like *.;#]§< and many "" more."
],
[
"plus there are even
newlines
in it."
]"""
output = [re.sub('\s+', ' ', m.group(1)) for m in re.finditer(r'(?s)\[[^"]*"(.*?)"[^]"]*\]', test)]
結果:
['this is a text and its supposed to contain every possible char.', 'like *.;#]§< and many "" more.', 'plus there are even newlines in it.']
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.