使用正则表达式过滤数据列表的Pythonic方法？

Question

I have a list of strings which I'd like to filter using a regex. 我有一个字符串列表，我想用正则表达式过滤。 I have the beginnings of a solution: 我有一个解决方案的开端：

lines = ['Some data', 'Data of interest', 'Some data', 'Data of Interest', 'Some data', 'Data of interest']
r = re.compile(r'.*[iI]nterest.*')
relevant_lines = [r.findall(line) for line in lines]
print(relevant_lines)

...that almost works: ......几乎有效：

[[], ['Data of interest'], [], ['Data of Interest'], [], ['Data of interest']]

...but is there a way to only populate the resulting list with the lines that match and without the nested lists? ...但是有没有办法只用匹配和没有嵌套列表的行填充结果列表？

Edit - is there a cleaner way than the following? 编辑 - 是否有比以下更清洁的方式？

[r[0] for r in [r.findall(line) for line in lines] if len(r) > 0]

Answer 1

Just use a normal loop, not everything is suitable for a list comp: 只需使用普通循环，并非所有内容都适合列表comp：

r = re.compile(r'.*[iI]nterest.*')
relevant_lines = []
for line in lines:
    mtch = r.match(line)
    if mtch:
        relevant_lines.append(mtch.group())

If you were using a list comp, a generator expression and filtering the empty lists would be better: 如果你使用列表comp，生成器表达式和过滤空列表会更好：

relevant_lines = filter(None,(r.findall(line) for line in lines))

Or indeed filter with match: 或者确实过滤匹配：

[x.group() for x in filter(None,(r.match(line) for line in lines))]

for python2 use itertools.ifilter . 对于python2，使用itertools.ifilter 。

Or for a more functional approach switching map for itertools.imap and filter for ifilter using python2: 或者使用python2为itertools.imap切换映射并使用ifilter过滤更实用的方法：

[x.group() for x in filter(None, map(r.match, lines))]

Your own list comp can be rewritten using a generator expression for the inner loop: 您可以使用内部循环的生成器表达式重写您自己的列表comp：

[r[0] for r in (r.findall(line) for line in lines) if r]

If you don't need the list use a generator expression and just iterate over it. 如果您不需要列表，请使用生成器表达式并迭代它。

Answer 2

relevant_lines = [m.group(0) for m in map(r.match, lines) if m is not None]

here is result in console: 这是控制台的结果：

>>> import re
>>> lines = ['Some data', 'Data of interest', 'Some data', 'Data of Interest', 'Some data', 'Data of interest']
>>> r = re.compile(r'.*[iI]nterest.*')
>>> relevant_lines = [m.group(0) for m in map(r.match, lines) if m is not None]
>>> relevant_lines
['Data of interest', 'Data of Interest', 'Data of interest']

things are not complicated. 事情并不复杂。 it's very good to combine functional programming with generators. 将函数式编程与生成器结合起来非常好。

使用正则表达式过滤数据列表的Pythonic方法？

问题描述

2 个解决方案

解决方案1
2 2015-02-13 15:52:00

解决方案2
1 2015-02-13 16:36:25

使用正则表达式过滤数据列表的Pythonic方法？

问题描述

2 个解决方案

解决方案1 2 2015-02-13 15:52:00

解决方案2 1 2015-02-13 16:36:25

解决方案1
2 2015-02-13 15:52:00

解决方案2
1 2015-02-13 16:36:25