在美丽的汤上缺少部分结果

Question

I am trying to retrieve few <p> tags in the following html code. 我试图在以下html代码中检索几个<p>标签。 Here is only the part of it 这里只是其中的一部分

<td class="eelantext">
    <a class="fBlackLink"></a>
    <center></center>
    <span> … </span><br></br>
    <table width="402" vspace="5" cellspacing="0" cellpadding="3" 
        border="0" bgcolor="#ffffff" align="Left">
    <tbody> … </tbody></table>
      <!--edstart-->
    <p> … </p>
    <p> … </p>
    <p> … </p>
    <p> … </p>
    <p> … </p>
</td>

You can find the webpage here 你可以在这里找到这个网页

My Python code is the following 我的Python代码如下

soup = BeautifulSoup(page)
div = soup.find('td', attrs={'class': 'eelantext'})
print div
text = div.find_all('p')

But the text variable is empty and if I print the div variable, I have exactly the same html from above except the <p> tags. 但是text变量是空的，如果我打印div变量，除了<p>标签之外，我有完全相同的html。

Answer 1

BeautifulSoup can use different parsers to handle HTML input . BeautifulSoup可以使用不同的解析器来处理HTML输入。 The HTML input here is a little broken, and the default HTMLParser parser doesn't handle it very well. 这里的HTML输入有点破碎，默认的HTMLParser解析器不能很好地处理它。

Use the html5lib parser instead: 改为使用html5lib解析器：

>>> len(BeautifulSoup(r.text, 'html').find('td', attrs={'class': 'eelantext'}).find_all('p'))
0
>>> len(BeautifulSoup(r.text, 'lxml').find('td', attrs={'class': 'eelantext'}).find_all('p'))
0
>>> len(BeautifulSoup(r.text, 'html5lib').find('td', attrs={'class': 'eelantext'}).find_all('p'))
22

在美丽的汤上缺少部分结果

问题描述

1 个解决方案

解决方案1
14 已采纳 2013-09-04 12:55:36

在美丽的汤上缺少部分结果

问题描述

1 个解决方案

解决方案1 14 已采纳 2013-09-04 12:55:36

解决方案1
14 已采纳 2013-09-04 12:55:36