[英]Capture all repetitions of a group using Python regular expression
I have an input of the following format: 我有以下格式的输入:
<integer>: <word> ... # <comment>
where ...
can represent one or more <word>
strings. 其中
...
可以表示一个或多个<word>
字符串。
Here is an example: 这是一个例子:
1: foo bar baz # This is an example
I want to split this input apart with regular expression and return a tuple that contains the integer followed by each word. 我想用正则表达式将该输入分开,并返回一个包含整数的元组,其后跟每个单词。 For the above example, I want:
对于上面的示例,我想要:
(1, 'foo', 'bar', 'baz')
This is what I have tried. 这就是我尝试过的。
>>> re.match('(\d+):( \w+)+', '1: foo bar baz # This is an example').groups()
('1', ' baz')
I am getting the integer and the last word only. 我只得到整数和最后一个字。 How do I get the integer and all the words that the regex matches?
我如何获得整数和正则表达式匹配的所有单词?
Non-regex solution: 非正则表达式解决方案:
>>> s = '1: foo bar baz # This is an example'
>>> a, _, b = s.partition(':')
>>> [int(a)] + b.partition('#')[0].split()
[1, 'foo', 'bar', 'baz']
You can probably make it a lot clearer with simple string manipulation. 您可以通过简单的字符串操作使其更加清晰。
my_string = '1: foo bar baz'
num_string, word_string = my_string.split(':')
num = int(num_string)
words = word_string.strip().split(' ')
print(num)
print(words)
Output: 输出:
# num = 1
# words = ['foo', 'bar', 'baz']
The trick here is to use lookeaheads: let's find either digits (followed by a colon) or words (followed by letters/spaces and a hash): 这里的技巧是使用前瞻符号:让我们找到数字(后跟冒号)或单词(后跟字母/空格和哈希):
s = "1: foo bar baz # This is an example"
print re.findall(r'\d+(?=:)|\w+(?=[\w\s]*#)', s)
# ['1', 'foo', 'bar', 'baz']
The only thing that remains is to convert "1"
to an int - but you can't do that with regexp. 剩下的唯一事情就是将
"1"
转换为int-但是您不能使用regexp来做到这一点。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.