Python-正則表達式-在某些字符之間匹配字符

Question

我有一個文本文件，我想匹配/查找/解析某些字符之間的所有字符（ [\\ n“ 文本以匹配 ” \\ n] ）。 文本本身在結構和字符方面可能有很大不同（它們可以包含所有可能的字符）。

我之前曾發布過此問題（對不起，重復問題），但是到目前為止，該問題無法解決，因此，我現在想更加精確地解決該問題。

文件中的文本是這樣構建的：

    test =""" 
        [
        "this is a text and its supposed to contain every possible char."
        ], 
        [
        "like *.;#]§< and many "" more."
        ], 
        [
        "plus there are even
newlines

in it."
        ]"""

我想要的輸出應該是一個列表（例如），分隔符之間的每個文本都作為元素，如下所示：

['this is a text and its supposed to contain every possible char.', 'like *.;#]§< and many "" more.', 'plus there are even newlines in it.']

我試圖用正則表達式解決這個問題，並用我想出的相應輸出提供了兩個解決方案：

my_list = re.findall(r'(?<=\[\n {8}\").*(?=\"\n {8}\])', test)
print (my_list)

['this is a text and its supposed to contain every possible char.', 'like *.;#]§< and many "" more.']

好吧，這個很近。 它列出了前兩個元素，但不幸的是沒有列出第三個元素，因為其中包含換行符。

my_list = re.findall(r'(?<=\[\n {8}\")[\s\S]*(?=\"\n {8}\])', test)
print (my_list)

['this is a text and its supposed to contain every possible char."\n        ], \n        [\n        "like *.;#]§< and many "" more."\n        ], \n        [\n        "plus there are even\nnewlines\n        \n        in it.']

好的，這次包括了每個元素，但是列表中只有一個元素，並且超前工作似乎並沒有像我想的那樣。

那么正確的正則表達式可以用來獲得我想要的輸出呢？ 為什么第二種方法不包括前瞻性？

還是有一種更干凈，更快捷的方式來獲得我想要的東西（美麗湯或其他方法？）？

非常感謝您的幫助和提示。

我正在使用python 3.6。

Answer 1

您應該使用DOTALL標志來匹配換行符

print(re.findall(r'\[\n\s+"(.*?)"\n\s+\]', test, re.DOTALL))

產量

['this is a text and its supposed to contain every possible char.', 'like *.;#]§< and many "" more.', 'plus there are even\nnewlines\n\nin it.']

Answer 2

您可以使用模式

(?s)\[[^"]*"(.*?)"[^]"]*\]

捕捉中的每一個元素" S中的括號內：

https://regex101.com/r/SguEAU/1

然后，您可以使用帶有re.sub的列表re.sub ，用單個普通空格替換每個捕獲的子字符串中的空格字符（包括換行符）：

test ="""
    [
    "this is a text and its supposed to contain every possible char."
    ],
    [
    "like *.;#]§< and many "" more."
    ],
    [
    "plus there are even
newlines

in it."
    ]"""

output = [re.sub('\s+', ' ', m.group(1)) for m in re.finditer(r'(?s)\[[^"]*"(.*?)"[^]"]*\]', test)]

結果：

['this is a text and its supposed to contain every possible char.', 'like *.;#]§< and many "" more.', 'plus there are even newlines in it.']

Python-正則表達式-在某些字符之間匹配字符

問題描述

2 個解決方案

解決方案1
1 已采納 2018-12-07 09:39:44

解決方案2
1 2018-12-07 09:48:15

Python-正則表達式-在某些字符之間匹配字符

問題描述

2 個解決方案

解決方案1 1 已采納 2018-12-07 09:39:44

解決方案2 1 2018-12-07 09:48:15

解決方案1
1 已采納 2018-12-07 09:39:44

解決方案2
1 2018-12-07 09:48:15