[英]How do I collect values into a list in Python standard regex?
I have a string with repeated parts:我有一个包含重复部分的字符串:
s = '[1][2][5] and [3][8]'
And I want to group the numbers into two lists using re.match
.我想使用
re.match
将数字分组到两个列表中。 The expected result is:预期结果是:
{'x': ['1', '2', '5'], 'y': ['3', '8']}
I tried this expression that gives a wrong result:我试过这个给出错误结果的表达式:
re.match(r'^(?:\[(?P<x>\d+)\])+ and (?:\[(?P<y>\d+)\])+$', s).groupdict()
# {'x': '5', 'y': '8'}
It looks like re.match
keeps the last match only.看起来
re.match
只保留最后一场比赛。 How do I collect all the parts into a list instead of the last one only?如何将所有部分收集到一个列表中,而不是只收集最后一个?
Of course, I know that I could split the line on ' and '
separator and use re.findall
for the parts instead, but this approach is not general enough because it gives some issues for more complex strings so I would always need to think about correct splitting separately all the time.当然,我知道我可以在
' and '
分隔符上拆分行并改为使用re.findall
作为部分,但这种方法不够通用,因为它给更复杂的字符串带来了一些问题,所以我总是需要考虑始终分别正确拆分。
We can use regular expressions here.我们可以在这里使用正则表达式。 First, iterate the input string looking for matches of the type
[3][8]
.首先,迭代输入字符串以查找
[3][8]
类型的匹配项。 For each match, use re.findall
to generate a list of number strings.对于每个匹配项,使用
re.findall
生成一个数字字符串列表。 Then, add a key whose value is that list.然后,添加一个值为该列表的键。 Note that we maintain a list of keys and pop each one when we use it.
请注意,我们维护了一个键列表,并在我们使用它时弹出每个键。
import re
s = '[1][2][5] and [3][8]'
keys= ['x', 'y']
d = {}
for m in re.finditer('(?:\[\d+\])+', s):
d[keys.pop(0)] = re.findall(r'\d+', m.group())
print(d) # {'y': ['3', '8'], 'x': ['1', '2', '5']}
import re
s = '[1][2][5] and [3][8]'
# Use a regular expression to extract the numbers from the string
numbers = re.findall(r'\d+', s)
# Group the numbers into a dictionary using a dictionary comprehension
result = {
'x': numbers[:3], # First three numbers
'y': numbers[3:] # Remaining numbers
}
print(result) # {'x': ['1', '2', '5'], 'y': ['3', '8']}
The regular expression \d+
matches one or more digits, and the findall()
function returns a list of all the matches.正则表达式
\d+
匹配一个或多个数字, findall()
函数返回所有匹配的列表。 The dictionary comprehension then groups the numbers into the desired lists x
and y
.字典理解然后将数字分组到所需的列表
x
和y
中。
If you want to use the named capture groups, you can write the pattern like this repeating the digits between the square brackets inside the named group.如果你想使用命名的捕获组,你可以像这样写模式,重复命名组内方括号之间的数字。
Then you can get the digits from the groupdict using re.findall on the values and first check if there is a match for the pattern:然后您可以使用 re.findall 对值从 groupdict 中获取数字,并首先检查模式是否匹配:
^(?P<x>(?:\[\d+])+) and (?P<y>(?:\[\d+])+)$
See a regex demo查看正则表达式演示
Example例子
import re
s = '[1][2][5] and [3][8]'
m = re.match(r'^(?P<x>(?:\[\d+])+) and (?P<y>(?:\[\d+])+)$', s)
if m:
dct = {k: re.findall(r"\d+", v) for k, v in m.groupdict().items()}
print(dct)
Output输出
{'x': ['1', '2', '5'], 'y': ['3', '8']}
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.