[英]Get a list of substrings from a list of strings where the substrings match a certain regular expression
This question is for Python 3.6+ (but feel free to answer for lower Pythons for other readers). 这个问题是针对Python 3.6以上版本的(但对于其他读者来说,较低版本的Python可以随意回答)。
I want to extract a substring from each string that matches a regular expression. 我想从每个与正则表达式匹配的字符串中提取一个子字符串。
Say I have the following: 说我有以下几点:
a = ['v-01-001', 'v-01-002', 'v-02-001', 'v-02-002', 'v-02-003', 'v-03-001']
I want the last 3 digits of all strings matching v-02-\\d\\d\\d
, ie: 我想要匹配
v-02-\\d\\d\\d
的所有字符串的最后3位数字,即:
['001', '002', '003']
My naive attempt: 我的天真尝试:
[x[1] for x in list(map(lambda i: re.search(r'v-02-(\d\d\d)', i), a)) if x]
Can you come up with anything more elegant? 您能提出更优雅的东西吗?
Thanks 谢谢
You could do something like this: 您可以执行以下操作:
import re
a = ['v-01-001', 'v-01-002', 'v-02-001', 'v-02-002', 'v-02-003', 'v-03-001']
pattern = re.compile('v-02-(\d{3})$')
print([m.group(1) for m in map(pattern.match, a) if m])
Output 产量
['001', '002', '003']
Also you could use finditer
: 你也可以使用
finditer
:
print([m.group(1) for ms in map(pattern.finditer, a) for m in ms])
Output 产量
['001', '002', '003']
Four ways to do this. 有四种方法可以做到这一点。
The first is just a regular 'ole loop: 第一个只是常规的'ole循环:
li=[]
for s in a:
m = re.search(r'v-02-(\d\d\d)', s)
if m:
li.append(m.group(1))
# li=['001', '002', '003']
Second in two calls to the same regex in a list comprehension: 在列表理解中两次调用同一个正则表达式:
>>> [re.search(r'v-02-(\d\d\d)', s).group(1) for s in a if re.search(r'v-02-(\d\d\d)', s)]
['001', '002', '003']
Third is to use map
: 第三是使用
map
:
>>> [m.group(1) for m in map(lambda s: re.search(r'v-02-(\d\d\d)', s), a) if m]
['001', '002', '003']
Finally, you can flatten the list with .join
and then use findall
: 最后,您可以使用
.join
展平列表,然后使用findall
:
>>> re.findall(r'\bv-02-(\d\d\d)\b', '\t'.join(a))
['001', '002', '003']
Or, use \\n
and re.M
vs two \\b
: 或者,使用
\\n
和re.M
与两个\\b
:
>>> re.findall(r'^v-02-(\d\d\d)$', '\n'.join(a), flags=re.M)
['001', '002', '003']
I would probably write this in that same order if I were writing this bit of code. 如果我编写这段代码,我可能会以相同的顺序编写。
What is considered more elegant is in the eye of the beholder I suppose. 我认为,在旁观者的眼中, 更为优雅的是。 I would consider the last one to be more elegant.
我认为最后一个更优雅。
You can also skip the regex and use Python's string methods: 您还可以跳过正则表达式,并使用Python的字符串方法:
>>> prefix='v-02-'
>>> [e[len(prefix):] for e in filter(lambda s: s.startswith(prefix),a)]
['001', '002', '003']
That would likely be the fastest if that matters in this case. 在这种情况下,那可能是最快的 。
In December of 2019, there will be a more elegant alternative. 在2019年12月,将有一个更优雅的选择。 As defined in PEP 572 , you will be able to use an assignment statement so you can assign the match and test the match in one step:
根据PEP 572中的定义,您将能够使用赋值语句,以便您可以分配匹配并一步测试匹配:
[m.group(1) for s in a if (m:=re.search(r'v-02-(\d\d\d)', s))]
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.