簡體   English   中英

使用 Python 中的 Beautiful Soup 從 HTML 中檢索具有相同標簽名稱的表值

[英]Retrieving table values from HTML with the same tag names using Beautiful Soup in Python

我正在嘗試使用 Beautiful Soup 檢索下表的所有td text ,不幸的是標簽名稱相同,我只能檢索第一個元素或某些元素重復打印。 因此不太確定如何 go 關於它。

下面是 HTML 表片段:

<div>Table</div>
<table class="Auto" width="100%">
    <tr>
       <td class="Auto_head">Address</td>
       <td class="Auto_head">Name</td>
       <td class="Auto_head">Type</td>
       <td class="Auto_head">Value IN</td>
       <td class="Auto_head">AUTO Statement</td>
       <td class="Auto_head">Value OUT</td>
       <td class="Auto_head">RESULT</td>
       <td class="Auto_head"></td>
    </tr>
    <tr>
           <td class="Auto_body">1</td>
           <td class="Auto_body">abc</td>
           <td class="Auto_body">yes</td>
           <td class="Auto_body">abc123</td>
           <td class="Auto_body">jar</td>
           <td class="Auto_body">123abc</td>
           <td class="Auto_body">PASS</td>
           <td class="Auto_body">na</td>
    </tr>

我想要的是這些標簽內的所有文本內容,例如第一個auto_head對應於第一個auto_bodyAddress = 1同樣應該檢索所有值。

我用過 find、findall、findNext 和 next_sibling 但沒有運氣。 這是我在 python 中的當前代碼:

self.table = self.soup_file.findAll(class_="Table")
self.headers = [tab.find(class_="Auto_head").findNext('td',class_="Auto_head").contents[0] for tab in self.table]
self.data = [data.find(class_="Auto_body").findNext('td').contents[0] for data in self.table]

先獲取 headers,然后使用zip(...)進行組合

from bs4 import BeautifulSoup

data = '''\
<table class="Auto" width="100%">
    <tr>
       <td class="Auto_head">Address</td>
       <td class="Auto_head">Name</td>
       <td class="Auto_head">Type</td>
    </tr>
    <tr>
           <td class="Auto_body">1</td>
           <td class="Auto_body">abc</td>
           <td class="Auto_body">yes</td>
    </tr>
    <tr>
           <td class="Auto_body">2</td>
           <td class="Auto_body">def</td>
           <td class="Auto_body">no</td>
    </tr>
    <tr>
           <td class="Auto_body">3</td>
           <td class="Auto_body">ghi</td>
           <td class="Auto_body">maybe</td>
    </tr>
</table>
'''

soup = BeautifulSoup(data, 'html.parser')

for table in soup.select('table.Auto'):
    # get rows
    rows = table.select('tr')
    # get headers
    headers = [td.text for td in rows[0].select('td.Auto_head')]
    # get details
    for row in rows[1:]:
        values = [td.text for td in row.select('td.Auto_body')]
        print(dict(zip(headers, values)))

我的 output:

{'Address': '1', 'Name': 'abc', 'Type': 'yes'}
{'Address': '2', 'Name': 'def', 'Type': 'no'}
{'Address': '3', 'Name': 'ghi', 'Type': 'maybe'}

首先獲取每個類別,然后使用zip進行迭代

s = '''<div>Table</div>
<table class="Auto" width="100%">
    <tr>
       <td class="Auto_head">Address</td>
       <td class="Auto_head">Name</td>
       <td class="Auto_head">Type</td>
       <td class="Auto_head">Value IN</td>
       <td class="Auto_head">AUTO Statement</td>
       <td class="Auto_head">Value OUT</td>
       <td class="Auto_head">RESULT</td>
       <td class="Auto_head"></td>
    </tr>
    <tr>
           <td class="Auto_body">1</td>
           <td class="Auto_body">abc</td>
           <td class="Auto_body">yes</td>
           <td class="Auto_body">abc123</td>
           <td class="Auto_body">jar</td>
           <td class="Auto_body">123abc</td>
           <td class="Auto_body">PASS</td>
           <td class="Auto_body">na</td>
    </tr></table>'''

soup = BeautifulSoup(s,features='html')
head = soup.find_all(name='td',class_='Auto_head')
body = soup.find_all(name='td',class_='Auto_body')
for one,two in zip(head,body):
    print(f'{one.text}={two.text}')

Address=1
Name=abc
Type=yes
Value IN=abc123
AUTO Statement=jar
Value OUT=123abc
RESULT=PASS
=na

按 CSS class 搜索

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM