簡體   English   中英

python regex-某些字符之間的字符

[英]python regex - characters between certain characters

編輯:我應該補充一點,測試中的字符串應該包含所有可能的字符(即* + $§€/等)。 所以我認為正則表達式應該最好。

我正在使用正則表達式查找某些字符([“和”]之間的所有字符。我的示例如下:

test = """["this is a text and its supposed to contain every possible char."], 
    ["another one after a newline."], 

    ["and another one even with
    newlines

    in it."]"""

假定的輸出應如下所示:

['this is a text and its supposed to contain every possible char.', 'another one after a newline.', 'and another one even with newlines in it.']

我的代碼(包括正則表達式)如下所示:

import re
my_list = re.findall(r'(?<=\[").*(?="\])*[^ ,\n]', test)
print (my_list)

我的結果如下:

['this is a text and its supposed to contain every possible char."]', 'another one after a newline."]', 'and another one even with']

所以有兩個問題:

1)它不刪除文本結尾的"] ,因為我希望它與(?="\\])

2)它沒有捕獲括號中的第三個文本,請考慮換行符。 但是到目前為止,當我嘗試使用.*\\n時,我還無法捕獲它們.*\\n它給了我一個空字符串。

感謝您對此問題的幫助或提示。 先感謝您。

Btw iam在anaconda-spyder和最新的regex(2018)上使用python 3.6。

編輯2:測試的一種變更:

test = """[
    "this is a text and its supposed to contain every possible char."
    ], 
    [
    "another one after a newline."
    ], 

    [
    "and another one even with
    newlines

    in it."
    ]"""

我再次很難從其中刪除換行符,猜想可以用\\ s刪除空格,所以我想像這樣的正則表達式可以解決它。

my_list = re.findall(r'(?<=\[\S\s\")[\w\W]*(?=\"\S\s\])', test)
print (my_list)

但這僅返回一個空列表。 如何從該輸入獲取上面的假定輸出?

您可以嘗試這個伴侶。

(?<=\[\")[\w\s.]+(?=\"\])

演示版

您在正則表達式.*錯過的內容將與換行符不匹配。

PS我沒有匹配特殊字符。 如果您願意,可以非常輕松地實現。

這個也匹配特殊字符

(?<=\\[\\")[\\w\\W]+?(?=\\"\\])

演示2

如果您也可以接受不使用正則表達式的解決方案,則可以嘗試

result = []
for l in eval(' '.join(test.split())):
    result.extend(l)

print(result)
#  ['this is a text and its supposed to contain every possible char.', 'another one after a newline.', 'and another one even with newlines in it.']

所以這就是我的想法:

test = """["this is a text and its supposed to contain every possible char."], 
    ["another one after a newline."], 

    ["and another one even with
    newlines

    in it."]"""

for i in test.replace('\n', '').replace('    ', ' ').split(','):
    print(i.lstrip(r' ["').rstrip(r'"]'))

結果將以下內容打印到屏幕上

this is a text and its supposed to contain every possible char.
another one after a newline.
and another one even with newlines in it.

如果您需要這些-exact-字符串的列表,我們可以將其修改為-

newList = []
for i in test.replace('\n', '').replace('    ', ' ').split(','):
  newList.append(i.lstrip(r' ["').rstrip(r'"]'))

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM