简体   繁体   English

使用Python正则表达式捕获组中的所有重复项

[英]Capture all repetitions of a group using Python regular expression

I have an input of the following format: 我有以下格式的输入:

<integer>: <word> ... # <comment>

where ... can represent one or more <word> strings. 其中...可以表示一个或多个<word>字符串。

Here is an example: 这是一个例子:

1: foo bar baz # This is an example

I want to split this input apart with regular expression and return a tuple that contains the integer followed by each word. 我想用正则表达式将该输入分开,并返回一个包含整数的元组,其后跟每个单词。 For the above example, I want: 对于上面的示例,我想要:

(1, 'foo', 'bar', 'baz')

This is what I have tried. 这就是我尝试过的。

>>> re.match('(\d+):( \w+)+', '1: foo bar baz # This is an example').groups()
('1', ' baz')

I am getting the integer and the last word only. 我只得到整数和最后一个字。 How do I get the integer and all the words that the regex matches? 我如何获得整数和正则表达式匹配的所有单词?

Non-regex solution: 非正则表达式解决方案:

>>> s = '1: foo bar baz # This is an example'
>>> a, _, b = s.partition(':')
>>> [int(a)] + b.partition('#')[0].split()
[1, 'foo', 'bar', 'baz']

You can probably make it a lot clearer with simple string manipulation. 您可以通过简单的字符串操作使其更加清晰。

my_string = '1: foo bar baz'
num_string, word_string = my_string.split(':')
num = int(num_string)
words = word_string.strip().split(' ')

print(num)
print(words)

Output: 输出:

# num = 1
# words = ['foo', 'bar', 'baz']

The trick here is to use lookeaheads: let's find either digits (followed by a colon) or words (followed by letters/spaces and a hash): 这里的技巧是使用前瞻符号:让我们找到数字(后跟冒号)或单词(后跟字母/空格和哈希):

s = "1: foo bar baz # This is an example"
print re.findall(r'\d+(?=:)|\w+(?=[\w\s]*#)', s)
# ['1', 'foo', 'bar', 'baz']

The only thing that remains is to convert "1" to an int - but you can't do that with regexp. 剩下的唯一事情就是将"1"转换为int-但是您不能使用regexp来做到这一点。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM