[英]python regex split string while keeping delimiter with value
我正在尝试将名称为:value元素的文本文件解析为带有“name:value”的列表...这是一个扭曲:值有时会是多个单词甚至是多行,并且分隔符不是固定的集合的话。 这是我正在尝试使用的一个例子......
listing="price:44.55 name:John Doe title:Super Widget description:This widget slices, dices, and drives your kids to soccer practice\r\nIt even comes with Super Widget Mini!
我想要回归的是......
["price:44.55", "name:John Doe", "title:Super Widget", "description:This widget slices, dices, and drives your kids to soccer practice\r\nIt even comes with Super Widget Mini!"]
这是我到目前为止所尝试的......
details = re.findall(r'[\w]+:.*', post, re.DOTALL)
["price:", "44.55 name:John Doe title:Super Widget description:This widget slices, dices, and drives your kids to soccer practice\r\nIt even comes with Super Widget Mini!"]
不是我想要的。 要么...
details = re.findall(r'[\w]+:.*?', post, re.DOTALL)
["price:", "name:", "title:", "description:"]
不是我想要的。 要么...
details = re.split(r'([\w]+:)', post)
["", "price:", "44.55", "name:", "John Doe", "title:", "Super Widget", "description:", "This widget slices, dices, and drives your kids to soccer practice\r\nIt even comes with Super Widget Mini!"]
哪个更近,但仍然没有骰子。 此外,我可以处理一个空列表项。 所以,基本上,我的问题是如何使用re.split()上的值保留分隔符,或者如何使re.findall()保持过于贪婪或过于吝啬?
提前感谢您的阅读!
使用前瞻性断言:
>>> re.split(r'\s(?=\w+:)', post)
['price:44.55',
'name:John Doe',
'title:Super Widget',
'description:This widget slices, dices, and drives your kids to soccer practice\r\nIt even comes with Super Widget Mini!']
当然,如果你的值中有一个冒号后面跟着一些单词,它仍然会失败。
@ Pavel的答案更好,但您也可以将上一次尝试的结果合并在一起:
# kill the first empty bit
if not details[0]:
details.pop(0)
return [a + b for a, b in zip(details[::2], details[1::2])]
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.