使用Python正则表达式捕获组中的所有重复项

Question

I have an input of the following format: 我有以下格式的输入：

<integer>: <word> ... # <comment>

where ... can represent one or more <word> strings. 其中...可以表示一个或多个<word>字符串。

Here is an example: 这是一个例子：

1: foo bar baz # This is an example

I want to split this input apart with regular expression and return a tuple that contains the integer followed by each word. 我想用正则表达式将该输入分开，并返回一个包含整数的元组，其后跟每个单词。 For the above example, I want: 对于上面的示例，我想要：

(1, 'foo', 'bar', 'baz')

This is what I have tried. 这就是我尝试过的。

>>> re.match('(\d+):( \w+)+', '1: foo bar baz # This is an example').groups()
('1', ' baz')

I am getting the integer and the last word only. 我只得到整数和最后一个字。 How do I get the integer and all the words that the regex matches? 我如何获得整数和正则表达式匹配的所有单词？

Answer 1

Non-regex solution: 非正则表达式解决方案：

>>> s = '1: foo bar baz # This is an example'
>>> a, _, b = s.partition(':')
>>> [int(a)] + b.partition('#')[0].split()
[1, 'foo', 'bar', 'baz']

Answer 2

You can probably make it a lot clearer with simple string manipulation. 您可以通过简单的字符串操作使其更加清晰。

my_string = '1: foo bar baz'
num_string, word_string = my_string.split(':')
num = int(num_string)
words = word_string.strip().split(' ')

print(num)
print(words)

Output: 输出：

# num = 1
# words = ['foo', 'bar', 'baz']

Answer 3

The trick here is to use lookeaheads: let's find either digits (followed by a colon) or words (followed by letters/spaces and a hash): 这里的技巧是使用前瞻符号：让我们找到数字（后跟冒号）或单词（后跟字母/空格和哈希）：

s = "1: foo bar baz # This is an example"
print re.findall(r'\d+(?=:)|\w+(?=[\w\s]*#)', s)
# ['1', 'foo', 'bar', 'baz']

The only thing that remains is to convert "1" to an int - but you can't do that with regexp. 剩下的唯一事情就是将"1"转换为int-但是您不能使用regexp来做到这一点。

使用Python正则表达式捕获组中的所有重复项

问题描述

3 个解决方案

解决方案1
2 2014-02-09 20:16:31

解决方案2
1 已采纳 2014-02-09 20:16:39

解决方案3
1 2014-02-09 20:36:24

使用Python正则表达式捕获组中的所有重复项

问题描述

3 个解决方案

解决方案1 2 2014-02-09 20:16:31

解决方案2 1 已采纳 2014-02-09 20:16:39

解决方案3 1 2014-02-09 20:36:24

解决方案1
2 2014-02-09 20:16:31

解决方案2
1 已采纳 2014-02-09 20:16:39

解决方案3
1 2014-02-09 20:36:24