Python Beautiful Soup解析具有特定ID的表

Question

I'm trying to get the data from a table with a specific ID which I know. 我正在尝试从具有特定ID的表中获取数据，我知道。 For some reason, the code keeps giving me a None result. 出于某种原因，代码一直给我一个无结果。

From the HTML code I'm trying to parse: 从我试图解析的HTML代码：

<table cellspacing="0" cellpadding="3" border="0" id="ctl00_SPWebPartManager1_g_c001c0d9_0cb8_4b0f_b75a_7cc3b6f7d790_ctl00_HistoryData1_gridHistoryData_DataGrid1" style="width:100%;border-collapse:collapse;">
    <tr class="gridHeader" valign="top">
        <td class="titleGridRegNoB" align="center" valign="top"><span dir=RTL>שווי שוק (אלפי ש"ח)</span></td>
        <td class="titleGridReg" align="center" valign="top">הון רשום למסחר</td>
        <td class="titleGridReg" align="center" valign="top">שער נמוך</td><td class="titleGridReg" align="center" valign="top">שער גבוה</td>
        <td class="titleGridReg" align="center" valign="top">שער בסיס</td>
        <td class="titleGridReg" align="center" valign="top">שער פתיחה</td><td class="titleGridReg" align="center" valign="top"><span dir="rtl">שער נעילה (באגורות)</span></td>
        <td class="titleGridReg" align="center" valign="top">שער נעילה מתואם</td><td class="titleGridReg" align="center" valign="top">תאריך</td>
    </tr>
    <tr onmouseover="this.style.backgroundColor='#FDF1D7'" onmouseout="this.style.backgroundColor='#ffffff'">

... And so on ... 等等

My code: 我的代码：

html = br.response().read()
soup = BeautifulSoup(html)

table = soup.find(lambda tag: tag.name=='table' and tag.has_key('id') and tag['id']=="ctl00_SPWebPartManager1_g_c001c0d9_0cb8_4b0f_b75a_7cc3b6f7d790_ctl00_HistoryData1_gridHistoryData_DataGrid1")
rows = table.findAll(lambda tag: tag.name=='tr')

In [100]: print table
None

Answer 1

From the documentation : 从文档：

table = soup.find('table', id="ctl00_SPWebPartManager1_g_c001c0d9_0cb8_4b0f_b75a_7cc3b6f7d790_ctl00_HistoryData1_gridHistoryData_DataGrid1")

And the for the rows line: 而对于行行：

rows = table.findAll('tr')

For the encoding problem, try decoding it from utf-8 , and re-encode it. 对于编码问题，请尝试从utf-8解码，然后重新编码。

html = br.response().read().decode('utf-8')
soup = BeautifulSoup(html.encode('utf-8'))

Answer 2

Improving upon aiKid's answer: 改进aiKid的答案：

# coding=utf-8
from bs4 import BeautifulSoup

html = u"""
<table cellspacing="0" cellpadding="3" border="0" id="ctl00_SPWebPartManager1_g_c001c0d9_0cb8_4b0f_b75a_7cc3b6f7d790_ctl00_HistoryData1_gridHistoryData_DataGrid1" style="width:100%;border-collapse:collapse;">
                            <tr class="gridHeader" valign="top">
                                <td class="titleGridRegNoB" align="center" valign="top"><span dir=RTL>שווי שוק (אלפי ש"ח)</span></td><td class="titleGridReg" align="center" valign="top">הון רשום למסחר</td><td class="titleGridReg" align="center" valign="top">שער נמוך</td><td class="titleGridReg" align="center" valign="top">שער גבוה</td><td class="titleGridReg" align="center" valign="top">שער בסיס</td><td class="titleGridReg" align="center" valign="top">שער פתיחה</td><td class="titleGridReg" align="center" valign="top"><span dir="rtl">שער נעילה (באגורות)</span>
</td><td class="titleGridReg" align="center" valign="top">שער נעילה מתואם</td><td class="titleGridReg" align="center" valign="top">תאריך</td>
                            </tr><tr onmouseover="this.style.backgroundColor='#FDF1D7'" onmouseout="this.style.backgroundColor='#ffffff'">
"""

soup = BeautifulSoup(html)
print soup.find_all("table",
                    id="ctl00_SPWebPartManager1_g_c001c0d9_0cb8_4b0f_b75a_7cc3b6f7d790_ctl00_HistoryData1_gridHistoryData_DataGrid1")

Since you're working with UTF-8 data, you need to set the string as a unicode string like so u"""(...)""" . 由于您使用的是UTF-8数据，因此需要将字符串设置为unicode字符串，如u"""(...)""" 。 All you need to do to work with unicode is this: 使用unicode所需要做的就是：

br.response().read().decode('utf-8')

The above will give you an ASCII string, that you can later encode into unicode. 上面将为您提供一个ASCII字符串，您可以稍后将其编码为unicode。 Like, say the string is stored in html , and you can encode it back to unicode using html.encode("utf-8") . 比如说，字符串存储在html ，您可以使用html.encode("utf-8")其编码回unicode。 If you do this, you do not need to put the u in front of anything. 如果你这样做，你不需要把u放在任何东西面前。 You can treat everything as a regular string again. 您可以再次将所有内容视为常规字符串。

Python Beautiful Soup解析具有特定ID的表

问题描述

2 个解决方案

解决方案1
10 已采纳 2013-10-25 14:06:19

解决方案2
1 2013-10-25 14:17:55

Python Beautiful Soup解析具有特定ID的表

问题描述

2 个解决方案

解决方案1 10 已采纳 2013-10-25 14:06:19

解决方案2 1 2013-10-25 14:17:55

解决方案1
10 已采纳 2013-10-25 14:06:19

解决方案2
1 2013-10-25 14:17:55