簡體   English   中英

Python-正則表達式-在某些字符之間匹配字符

[英]Python - Regex - match characters between certain characters

我有一個文本文件,我想匹配/查找/解析某些字符之間的所有字符( [\\ n“ 文本以匹配 ” \\ n] )。 文本本身在結構和字符方面可能有很大不同(它們可以包含所有可能的字符)。

我之前曾發布過此問題(對不起,重復問題),但是到目前為止,該問題無法解決,因此,我現在想更加精確地解決該問題。

文件中的文本是這樣構建的:

    test =""" 
        [
        "this is a text and its supposed to contain every possible char."
        ], 
        [
        "like *.;#]§< and many "" more."
        ], 
        [
        "plus there are even
newlines

in it."
        ]"""

我想要的輸出應該是一個列表(例如),分隔符之間的每個文本都作為元素,如下所示:

['this is a text and its supposed to contain every possible char.', 'like *.;#]§< and many "" more.', 'plus there are even newlines in it.']

我試圖用正則表達式解決這個問題,並用我想出的相應輸出提供了兩個解決方案:

my_list = re.findall(r'(?<=\[\n {8}\").*(?=\"\n {8}\])', test)
print (my_list)

['this is a text and its supposed to contain every possible char.', 'like *.;#]§< and many "" more.']

好吧,這個很近。 它列出了前兩個元素,但不幸的是沒有列出第三個元素,因為其中包含換行符。

my_list = re.findall(r'(?<=\[\n {8}\")[\s\S]*(?=\"\n {8}\])', test)
print (my_list)

['this is a text and its supposed to contain every possible char."\n        ], \n        [\n        "like *.;#]§< and many "" more."\n        ], \n        [\n        "plus there are even\nnewlines\n        \n        in it.']

好的,這次包括了每個元素,但是列表中只有一個元素,並且超前工作似乎並沒有像我想的那樣。

那么正確的正則表達式可以用來獲得我想要的輸出呢? 為什么第二種方法不包括前瞻性?

還是有一種更干凈,更快捷的方式來獲得我想要的東西(美麗湯或其他方法?)?

非常感謝您的幫助和提示。

我正在使用python 3.6。

您應該使用DOTALL標志來匹配換行符

print(re.findall(r'\[\n\s+"(.*?)"\n\s+\]', test, re.DOTALL))

產量

['this is a text and its supposed to contain every possible char.', 'like *.;#]§< and many "" more.', 'plus there are even\nnewlines\n\nin it.']

您可以使用模式

(?s)\[[^"]*"(.*?)"[^]"]*\]

捕捉中的每一個元素" S中的括號內:

https://regex101.com/r/SguEAU/1

然后,您可以使用帶有re.sub的列表re.sub ,用單個普通空格替換每個捕獲的子字符串中的空格字符(包括換行符):

test ="""
    [
    "this is a text and its supposed to contain every possible char."
    ],
    [
    "like *.;#]§< and many "" more."
    ],
    [
    "plus there are even
newlines

in it."
    ]"""

output = [re.sub('\s+', ' ', m.group(1)) for m in re.finditer(r'(?s)\[[^"]*"(.*?)"[^]"]*\]', test)]

結果:

['this is a text and its supposed to contain every possible char.', 'like *.;#]§< and many "" more.', 'plus there are even newlines in it.']

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM