python正則表達式拆分字符串，同時保持分隔符的值

Question

我正在嘗試將名稱為：value元素的文本文件解析為帶有“name：value”的列表...這是一個扭曲：值有時會是多個單詞甚至是多行，並且分隔符不是固定的集合的話。 這是我正在嘗試使用的一個例子......

listing="price:44.55 name:John Doe title:Super Widget description:This widget slices, dices, and drives your kids to soccer practice\r\nIt even comes with Super Widget Mini!

我想要回歸的是......

["price:44.55", "name:John Doe", "title:Super Widget", "description:This widget slices, dices, and drives your kids to soccer practice\r\nIt even comes with Super Widget Mini!"]

這是我到目前為止所嘗試的......

details = re.findall(r'[\w]+:.*', post, re.DOTALL)
["price:", "44.55 name:John Doe title:Super Widget description:This widget slices, dices, and drives your kids to soccer practice\r\nIt even comes with Super Widget Mini!"]

不是我想要的。 要么...

details = re.findall(r'[\w]+:.*?', post, re.DOTALL)
["price:", "name:", "title:", "description:"]

不是我想要的。 要么...

details = re.split(r'([\w]+:)', post)
["", "price:", "44.55", "name:", "John Doe", "title:", "Super Widget", "description:", "This widget slices, dices, and drives your kids to soccer practice\r\nIt even comes with Super Widget Mini!"]

哪個更近，但仍然沒有骰子。 此外，我可以處理一個空列表項。 所以，基本上，我的問題是如何使用re.split（）上的值保留分隔符，或者如何使re.findall（）保持過於貪婪或過於吝嗇？

提前感謝您的閱讀！

Answer 1

使用前瞻性斷言：

>>> re.split(r'\s(?=\w+:)', post)
['price:44.55',
 'name:John Doe',
 'title:Super Widget',
 'description:This widget slices, dices, and drives your kids to soccer practice\r\nIt even comes with Super Widget Mini!']

當然，如果你的值中有一個冒號后面跟着一些單詞，它仍然會失敗。

Answer 2

@ Pavel的答案更好，但您也可以將上一次嘗試的結果合並在一起：

# kill the first empty bit
if not details[0]:
    details.pop(0)

return [a + b for a, b in zip(details[::2], details[1::2])]

python正則表達式拆分字符串，同時保持分隔符的值

問題描述

2 個解決方案

解決方案1
5 已采納 2013-02-05 19:21:12

解決方案2
2 2013-02-05 19:22:37

python正則表達式拆分字符串，同時保持分隔符的值

問題描述

2 個解決方案

解決方案1 5 已采納 2013-02-05 19:21:12

解決方案2 2 2013-02-05 19:22:37

解決方案1
5 已采納 2013-02-05 19:21:12

解決方案2
2 2013-02-05 19:22:37