正则表达式匹配，让所有组都由分隔符分隔吗？

Question

I have a special format for encoding, and I would like a regex that extracts the encoded information. 我有一种特殊的编码格式，我想要一个提取编码信息的正则表达式。 I have ':' as a special character that separates different 'blocks' of information. 我将'：'作为特殊字符分隔不同的信息“块”。 For example: 例如：

s = 'P:1:a:3:test_data'

Should get split to: 应该拆分为：

['P','1','a','3','test_data']

I can use: 我可以用：

s.split(':')

However, I can have a single ':' being encoded as well (there will never be more than 1 ':' grouped together, so there is no ambiguity). 但是，我也可以对单个'：'进行编码（永远不会有超过1个'：'分组在一起，因此不会产生歧义）。 So for example: 因此，例如：

s = 'P:1:::3:test_data'

Should give: 应该给：

['P','1',':','3','test_data']

Using split(':') fails here: 在这里使用split（'：'）失败：

['P', '1', '', '', '3', 'test_data']

What is the best way to capture that ':'? 捕获“：”的最佳方法是什么？ I am not very strong with regexes, I know regex groups can match atleast one character using '*+' but I am very confused on how to piece it all together. 我对正则表达式不是很坚强，我知道正则表达式组可以使用'* +'匹配至少一个字符，但是我对如何将它们拼凑起来感到非常困惑。 Better yet, is there a better way of doing it without regex? 更好的是，没有正则表达式，还有更好的方法吗？ I guess I can always iterate over the array, check for consecutive empty string and combine them to ':'. 我想我总是可以遍历数组，检查连续的空字符串并将它们组合为'：'。 Is there more elegant way of doing it? 有更优雅的方法吗？

Thanks 谢谢

Answer 1

For your specific case, you can use negative look around to restrict the colon you want to split on (?<!:):|:(?!:) , which is a colon that is not preceded and followed by another colon at the same time: 对于您的特定情况，您可以使用否定环顾四周来限制要在(?<!:):|:(?!:)上分割的冒号，该冒号在冒号之前和之后都没有另一个冒号同时：

import re
s = 'P:1:a:3:test_data'
s1 = 'P:1:::3:test_data'

re.split("(?<!:):|:(?!:)", s)
# ['P', '1', 'a', '3', 'test_data']

re.split("(?<!:):|:(?!:)", s1)
# ['P', '1', ':', '3', 'test_data']

Another option which is more general and can handle more than one : grouped with re.findall and (.+?)(?::|$) , ie lazily match at least one character until it finds a colon or reaches the end of string: 另一个更通用并且可以处理多个选项的选项:与re.findall和(.+?)(?::|$)分组，即，懒惰地匹配至少一个字符，直到找到冒号或到达字符串的末尾：

re.findall('(.+?)(?::|$)', 'P:1:::3:test_data')
# ['P', '1', ':', '3', 'test_data']

re.findall('(.+?)(?::|$)', 'P:1:::::3:test_data')
# ['P', '1', ':', ':', '3', 'test_data']

正则表达式匹配，让所有组都由分隔符分隔吗？

问题描述

1 个解决方案

解决方案1
1 已采纳 2017-09-18 01:16:26

正则表达式匹配，让所有组都由分隔符分隔吗？

问题描述

1 个解决方案

解决方案1 1 已采纳 2017-09-18 01:16:26

解决方案1
1 已采纳 2017-09-18 01:16:26