从字符串列表中获取子字符串列表，其中子字符串与某个正则表达式匹配

Question

This question is for Python 3.6+ (but feel free to answer for lower Pythons for other readers). 这个问题是针对Python 3.6以上版本的（但对于其他读者来说，较低版本的Python可以随意回答）。

I want to extract a substring from each string that matches a regular expression. 我想从每个与正则表达式匹配的字符串中提取一个子字符串。

Say I have the following: 说我有以下几点：

a = ['v-01-001', 'v-01-002', 'v-02-001', 'v-02-002', 'v-02-003', 'v-03-001']

I want the last 3 digits of all strings matching v-02-\\d\\d\\d , ie: 我想要匹配v-02-\\d\\d\\d的所有字符串的最后3位数字，即：

['001', '002', '003']

My naive attempt: 我的天真尝试：

[x[1] for x in list(map(lambda i: re.search(r'v-02-(\d\d\d)', i), a)) if x]

Can you come up with anything more elegant? 您能提出更优雅的东西吗？

Thanks 谢谢

Answer 1

You could do something like this: 您可以执行以下操作：

import re

a = ['v-01-001', 'v-01-002', 'v-02-001', 'v-02-002', 'v-02-003', 'v-03-001']
pattern = re.compile('v-02-(\d{3})$')
print([m.group(1) for m in map(pattern.match, a) if m])

Output 产量

['001', '002', '003']

Also you could use finditer : 你也可以使用finditer ：

print([m.group(1) for ms in map(pattern.finditer, a) for m in ms])

Output 产量

['001', '002', '003']

Answer 2

Four ways to do this. 有四种方法可以做到这一点。

The first is just a regular 'ole loop: 第一个只是常规的'ole循环：

li=[]
for s in a:
    m = re.search(r'v-02-(\d\d\d)', s)
    if m:
        li.append(m.group(1))
 # li=['001', '002', '003']

Second in two calls to the same regex in a list comprehension: 在列表理解中两次调用同一个正则表达式：

>>> [re.search(r'v-02-(\d\d\d)', s).group(1) for s in a if re.search(r'v-02-(\d\d\d)', s)]
['001', '002', '003']

Third is to use map : 第三是使用map ：

>>> [m.group(1) for m in map(lambda s: re.search(r'v-02-(\d\d\d)', s), a) if m]
['001', '002', '003']

Finally, you can flatten the list with .join and then use findall : 最后，您可以使用.join展平列表，然后使用findall ：

>>> re.findall(r'\bv-02-(\d\d\d)\b', '\t'.join(a))
['001', '002', '003']

Or, use \\n and re.M vs two \\b : 或者，使用\\n和re.M与两个\\b ：

>>> re.findall(r'^v-02-(\d\d\d)$', '\n'.join(a), flags=re.M)
['001', '002', '003']

I would probably write this in that same order if I were writing this bit of code. 如果我编写这段代码，我可能会以相同的顺序编写。

What is considered more elegant is in the eye of the beholder I suppose. 我认为，在旁观者的眼中， 更为优雅的是。 I would consider the last one to be more elegant. 我认为最后一个更优雅。

You can also skip the regex and use Python's string methods: 您还可以跳过正则表达式，并使用Python的字符串方法：

>>> prefix='v-02-'
>>> [e[len(prefix):] for e in filter(lambda s: s.startswith(prefix),a)]
['001', '002', '003']

That would likely be the fastest if that matters in this case. 在这种情况下，那可能是最快的 。

In December of 2019, there will be a more elegant alternative. 在2019年12月，将有一个更优雅的选择。 As defined in PEP 572 , you will be able to use an assignment statement so you can assign the match and test the match in one step: 根据PEP 572中的定义，您将能够使用赋值语句，以便您可以分配匹配并一步测试匹配：

[m.group(1) for s in a if (m:=re.search(r'v-02-(\d\d\d)', s))]

从字符串列表中获取子字符串列表，其中子字符串与某个正则表达式匹配

问题描述

2 个解决方案

解决方案1
1 已采纳 2018-10-10 15:14:43

解决方案2
1 2018-10-10 15:35:03

从字符串列表中获取子字符串列表，其中子字符串与某个正则表达式匹配

问题描述

2 个解决方案

解决方案1 1 已采纳 2018-10-10 15:14:43

解决方案2 1 2018-10-10 15:35:03

解决方案1
1 已采纳 2018-10-10 15:14:43

解决方案2
1 2018-10-10 15:35:03