使用 Python 中的 Beautiful Soup 從 HTML 中檢索具有相同標簽名稱的表值

Question

我正在嘗試使用 Beautiful Soup 檢索下表的所有td text ，不幸的是標簽名稱相同，我只能檢索第一個元素或某些元素重復打印。 因此不太確定如何 go 關於它。

下面是 HTML 表片段：

<div>Table</div>
<table class="Auto" width="100%">
    <tr>
       <td class="Auto_head">Address</td>
       <td class="Auto_head">Name</td>
       <td class="Auto_head">Type</td>
       <td class="Auto_head">Value IN</td>
       <td class="Auto_head">AUTO Statement</td>
       <td class="Auto_head">Value OUT</td>
       <td class="Auto_head">RESULT</td>
       <td class="Auto_head"></td>
    </tr>
    <tr>
           <td class="Auto_body">1</td>
           <td class="Auto_body">abc</td>
           <td class="Auto_body">yes</td>
           <td class="Auto_body">abc123</td>
           <td class="Auto_body">jar</td>
           <td class="Auto_body">123abc</td>
           <td class="Auto_body">PASS</td>
           <td class="Auto_body">na</td>
    </tr>

我想要的是這些標簽內的所有文本內容，例如第一個auto_head對應於第一個auto_body即Address = 1同樣應該檢索所有值。

我用過 find、findall、findNext 和 next_sibling 但沒有運氣。 這是我在 python 中的當前代碼：

self.table = self.soup_file.findAll(class_="Table")
self.headers = [tab.find(class_="Auto_head").findNext('td',class_="Auto_head").contents[0] for tab in self.table]
self.data = [data.find(class_="Auto_body").findNext('td').contents[0] for data in self.table]

Answer 1

先獲取 headers，然后使用zip(...)進行組合

from bs4 import BeautifulSoup

data = '''\
<table class="Auto" width="100%">
    <tr>
       <td class="Auto_head">Address</td>
       <td class="Auto_head">Name</td>
       <td class="Auto_head">Type</td>
    </tr>
    <tr>
           <td class="Auto_body">1</td>
           <td class="Auto_body">abc</td>
           <td class="Auto_body">yes</td>
    </tr>
    <tr>
           <td class="Auto_body">2</td>
           <td class="Auto_body">def</td>
           <td class="Auto_body">no</td>
    </tr>
    <tr>
           <td class="Auto_body">3</td>
           <td class="Auto_body">ghi</td>
           <td class="Auto_body">maybe</td>
    </tr>
</table>
'''

soup = BeautifulSoup(data, 'html.parser')

for table in soup.select('table.Auto'):
    # get rows
    rows = table.select('tr')
    # get headers
    headers = [td.text for td in rows[0].select('td.Auto_head')]
    # get details
    for row in rows[1:]:
        values = [td.text for td in row.select('td.Auto_body')]
        print(dict(zip(headers, values)))

我的 output：

{'Address': '1', 'Name': 'abc', 'Type': 'yes'}
{'Address': '2', 'Name': 'def', 'Type': 'no'}
{'Address': '3', 'Name': 'ghi', 'Type': 'maybe'}

Answer 2

首先獲取每個類別，然后使用zip進行迭代

s = '''<div>Table</div>
<table class="Auto" width="100%">
    <tr>
       <td class="Auto_head">Address</td>
       <td class="Auto_head">Name</td>
       <td class="Auto_head">Type</td>
       <td class="Auto_head">Value IN</td>
       <td class="Auto_head">AUTO Statement</td>
       <td class="Auto_head">Value OUT</td>
       <td class="Auto_head">RESULT</td>
       <td class="Auto_head"></td>
    </tr>
    <tr>
           <td class="Auto_body">1</td>
           <td class="Auto_body">abc</td>
           <td class="Auto_body">yes</td>
           <td class="Auto_body">abc123</td>
           <td class="Auto_body">jar</td>
           <td class="Auto_body">123abc</td>
           <td class="Auto_body">PASS</td>
           <td class="Auto_body">na</td>
    </tr></table>'''

soup = BeautifulSoup(s,features='html')
head = soup.find_all(name='td',class_='Auto_head')
body = soup.find_all(name='td',class_='Auto_body')
for one,two in zip(head,body):
    print(f'{one.text}={two.text}')

Address=1
Name=abc
Type=yes
Value IN=abc123
AUTO Statement=jar
Value OUT=123abc
RESULT=PASS
=na

按 CSS class 搜索

使用 Python 中的 Beautiful Soup 從 HTML 中檢索具有相同標簽名稱的表值

問題描述

2 個解決方案

解決方案1
0 已采納 2020-11-26 19:12:16

解決方案2
0 2020-11-26 19:27:13

使用 Python 中的 Beautiful Soup 從 HTML 中檢索具有相同標簽名稱的表值

問題描述

2 個解決方案

解決方案1 0 已采納 2020-11-26 19:12:16

解決方案2 0 2020-11-26 19:27:13

解決方案1
0 已采納 2020-11-26 19:12:16

解決方案2
0 2020-11-26 19:27:13