Python 中的正則表達式前瞻和后瞻多次

Question

我的輸入格式如下（txt1）：

txt1 = "[('1','Hello is 1)people 2)animals'), ('People are 1) hello 2) animals'), ('a')]"

我想以以下格式提取它-

[['1','Hello is 1)people 2)animals'],['People are 1) hello 2) animals'],['a']]

所以，基本上，我想要括號內的信息。 但我無法做到這一點。 此外，我使用了 Lookahead 和 Lookbehind 來避免被數字拆分 - '1)' 或 '2)' 之前我使用re.split('[\(\)\[\]]的簡單語句時發生的情況

我一直在嘗試findall function 首先檢查我得到了什么。

r = re.findall(r'\((?=\').*(?<=\')\)(?=\,)', txt1)

我一直在——

["('1','Hello is 1)people 2)animals'), ('People are 1) hello 2) animals')"]

似乎它忽略了中間括號。 我該怎么做才能得到我需要的結果？

謝謝你。

筆記：

對於拆分 function，我打算用它來獲得所需的 output，我得到了這個 -

r = re.split(r'\((?=\').*(?<=\')\)(?=\,)', txt1)

['[', ", ('a')]"]

Answer 1

為什么是正則表達式？

import ast
[list(x) if isinstance(x, tuple) else [x] for x in ast.literal_eval(txt1)]
# => [['1', 'Hello is 1)people 2)animals'], ['People are 1) hello 2) animals'], ['a']]

如果您堅持使用正則表達式，除非字符串包含轉義引號，否則這應該有效：

[re.findall(r"'[^']*'", x) for x in re.findall(r"\(('[^']*'(?:,\s*'[^']*')*)\)", txt1)]
# => [["'1'", "'Hello is 1)people 2)animals'"], ["'People are 1) hello 2) animals'"], ["'a'"]]

Answer 2

無需使用regex的另一種解決方案：

txt1 = "[('1','Hello is 1)people 2)animals'), ('People are 1) hello 2) animals'), ('a')]"
replace_pairs = {
    "('": "'",
    "'), ": '#',
    '[': '',
    ']': '',
    "'": '',
}
for k, v in replace_pairs.items():
    txt1 = txt1.replace(k, v)

txt1 = txt1[:-1].split('#') # the last char is a paranthesis
print([i.split(',') for i in txt1])

Output：

[['1', 'Hello is 1)people 2)animals'], ['People are 1) hello 2) animals'], ['a']]

注意：如果輸入比您在此處顯示的更復雜，這可能不起作用。

Answer 3

您可以嘗試使用模式\(([^(]+)\)

解釋：

\( - 匹配(字面意思

(...) - 捕獲組

[^(]+ - 匹配除(

\) - 匹配)字面意思

並使用替換模式： [\1] ，它將第一個捕獲組（反向引用\1 ）放在方括號內。

演示

Python 中的正則表達式前瞻和后瞻多次

問題描述

3 個解決方案

解決方案1
0 2019-10-24 04:56:27

解決方案2
0 2019-10-24 05:00:16

解決方案3
0 2019-10-24 06:19:07

Python 中的正則表達式前瞻和后瞻多次

問題描述

3 個解決方案

解決方案1 0 2019-10-24 04:56:27

解決方案2 0 2019-10-24 05:00:16

解決方案3 0 2019-10-24 06:19:07

解決方案1
0 2019-10-24 04:56:27

解決方案2
0 2019-10-24 05:00:16

解決方案3
0 2019-10-24 06:19:07