在python过滤器函数中使用re.search

Question

I am not able to use re.search inside a filter expression. 我无法在过滤器表达式中使用re.search。

I am trying to use re.search to extract the href values from a list where each element is a html line. 我正在尝试使用re.search从每个元素是html行的列表中提取href值。

Here is what I am doing: 这是我在做什么：

>>> filter(lambda html_line: re.search('.*a href=\"([^\"]*).*', html_line), data)

[u'Directory Feb 28 23:57 <b><a href="/MyApp/LogBrowser?type=crawler/2014.02.28">2014.02.28</a></b>'
 u'Directory Mar 01 23:59 <b><a href="/MyApp/LogBrowser?type=crawler/2014.03.01">2014.03.01</a></b>'
 u'Directory Mar 02 23:50 <b><a href="/MyApp/LogBrowser?type=crawler/2014.03.02">2014.03.02</a></b>'
 u'Directory Mar 03 23:59 <b><a href="/MyApp/LogBrowser?type=crawler/2014.03.03">2014.03.03</a></b>'
 u'Directory Mar 04 23:50 <b><a href="/MyApp/LogBrowser?type=crawler/2014.03.04">2014.03.04</a></b>'
 u'Directory Mar 05 23:50 <b><a href="/MyApp/LogBrowser?type=crawler/2014.03.05">2014.03.05</a></b>'
 u'Directory Mar 06 23:50 <b><a href="/MyApp/LogBrowser?type=crawler/2014.03.06">2014.03.06</a></b>'
 u'Directory Mar 07 23:50 <b><a href="/MyApp/LogBrowser?type=crawler/2014.03.07">2014.03.07</a></b>'
 u'Directory Mar 08 23:50 <b><a href="/MyApp/LogBrowser?type=crawler/2014.03.08">2014.03.08</a></b>']

My re.search call seems to be working correctly. 我的研究电话似乎工作正常。

For example, this works: 例如，这有效：

>>> for html_line in data:
    print re.search('.*a href=\"([^\"]*).*', html_line).group(1)

/MyApp/LogBrowser?type=crawler/2014.02.28
/MyApp/LogBrowser?type=crawler/2014.03.01
/MyApp/LogBrowser?type=crawler/2014.03.02
/MyApp/LogBrowser?type=crawler/2014.03.03
/MyApp/LogBrowser?type=crawler/2014.03.04
/MyApp/LogBrowser?type=crawler/2014.03.05
/MyApp/LogBrowser?type=crawler/2014.03.06
/MyApp/LogBrowser?type=crawler/2014.03.07
/MyApp/LogBrowser?type=crawler/2014.03.08

Answer 1

filter will only filter the items it won't return the href value, you can use a list comprehension for this: filter只会过滤不会返回href值的项目，您可以为此使用列表推导：

r = re.compile(r'.*a href=\"([^\"]*).*')
data = [x.group(1) for x in (r.search(html_line) for html_line in data)
                                                                if x is not None]

在python过滤器函数中使用re.search

问题描述

1 个解决方案

解决方案1
3 已采纳 2014-03-31 17:22:36

在python过滤器函数中使用re.search

问题描述

1 个解决方案

解决方案1 3 已采纳 2014-03-31 17:22:36

解决方案1
3 已采纳 2014-03-31 17:22:36