简体   繁体   English

使用正则表达式捕获字符串中的多个模式

[英]Grabbing multiple patterns in a string using regex

In python I'm trying to grab multiple inputs from string using regular expression; 在python中,我试图使用正则表达式从字符串中获取多个输入; however, I'm having trouble. 但是,我遇到了麻烦。 For the string: 对于字符串:

inputs       =    12 1  345 543 2

I tried using: 我尝试使用:

match = re.match(r'\s*inputs\s*=(\s*\d+)+',string)

However, this only returns the value '2' . 但是,这仅返回值'2' I'm trying to capture all the values '12','1','345','543','2' but not sure how to do this. 我正在尝试捕获所有值'12','1','345','543','2'但不确定如何执行此操作。

Any help is greatly appreciated! 任何帮助是极大的赞赏!

EDIT: Thank you all for explaining why this is does not work and providing alternative suggestions. 编辑:谢谢大家解释为什么这是行不通的,并提供了替代建议。 Sorry if this is a repeat question. 抱歉,这是重复问题。

您可以尝试类似: re.findall("\\d+", your_string)

You cannot do this with a single regex (unless you were using .NET), because each capturing group will only ever return one result even if it is repeated (the last one in the case of Python). 您不能使用单个正则表达式来执行此操作(除非您使用的是.NET),因为每个捕获组即使重复,也只会返回一个结果(在Python中为最后一个)。

Since variable length lookbehinds are also not possible (in which case you could do (?<=inputs.*=.*)\\d+ ), you will have to separate this into two steps: 由于也不可能进行变长后向查找(在这种情况下,您可以这样做(?<=inputs.*=.*)\\d+ ),因此您必须将其分为两个步骤:

match = re.match(r'\s*inputs\s*=\s*(\d+(?:\s*\d+)+)', string)
integers = re.split(r'\s+',match.group(1))

So now you capture the entire list of integers (and the spaces between them), and then you split that capture at the spaces. 因此,现在您捕获整数的整个列表(以及它们之间的空格),然后在该空格处拆分该捕获。

The second step could also be done using findall : 第二步也可以使用findall完成:

integers = re.findall(r'\d+',match.group(1))

The results are identical. 结果是相同的。

You can embed your regular expression: 您可以嵌入正则表达式:

import re
s = 'inputs       =    12 1  345 543 2'
print re.findall(r'(\d+)', re.match(r'inputs\s*=\s*([\s\d]+)', s).group(1))
>>> 
['12', '1', '345', '543', '2']

Or do it in layers: 或者分层进行:

import re

def get_inputs(s, regex=r'inputs\s*=\s*([\s\d]+)'):
    match = re.match(regex, s)
    if not match:
        return False # or raise an exception - whatever you want
    else:
        return re.findall(r'(\d+)', match.group(1))

s = 'inputs       =    12 1  345 543 2'
print get_inputs(s)
>>> 
['12', '1', '345', '543', '2']

You should look at this answer: https://stackoverflow.com/a/4651893/1129561 您应该查看以下答案: https : //stackoverflow.com/a/4651893/1129561

In short: 简而言之:

In Python, this isn't possible with a single regular expression: each capture of a group overrides the last capture of that same group (in .NET, this would actually be possible since the engine distinguishes between captures and groups). 在Python中,用单个正则表达式是不可能的:组的每次捕获都将覆盖同一组的最后一次捕获(在.NET中,由于引擎区分捕获和组,因此这实际上是可能的)。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM