我的汤怎么了？

Question

我在BeautifulSoup 4中使用python在html页面中查找与特定正则表达式匹配的链接。 我能够找到与正则表达式匹配的链接和文本，但是两者结合在一起将无法正常工作。 这是我的代码：

import re
import bs4

s = '<a href="javascript://">Sign in&nbsp;<br /></a>'

soup = bs4.BeautifulSoup(s)

match = re.compile(r'sign\s?in', re.IGNORECASE)

print soup.find_all(text=match)  # [u'Sign in\xa0']
print soup.find_all(name='a')[0].text  # Sign in 

print soup.find_all('a', text=match) # []

评论是输出 。 如您所见，合并的搜索不返回任何结果。 这很奇怪。

似乎与链接文本中包含的“ br”标签（或通用标签）有关。 如果删除它，则一切正常。

Answer 1

您可以查找标签或查找其文本内容，但不能一起查找：

鉴于：

self.name = u'a'
self.text = SRE_Pattern: <_sre.SRE_Pattern object at 0xd52a58>

从来源：

# If it's text, make sure the text matches.
elif isinstance(markup, NavigableString) or \
         isinstance(markup, basestring):
    if not self.name and not self.attrs and self._matches(markup, self.text):
        found = markup

这使得@Totem指出了设计的方式

我的汤怎么了？

问题描述

1 个解决方案

解决方案1
0 2014-02-20 02:11:05

我的汤怎么了？

问题描述

1 个解决方案

解决方案1 0 2014-02-20 02:11:05

解决方案1
0 2014-02-20 02:11:05