使用 BeautifulSoup 從湯中提取標簽

Question

'''
<div class="kt-post-card__body>
<div class="kt-post-card__title">Example_1</div>
<div class="kt-post-card__description">Example_2</div>
<div class="kt-post-card__bottom">
<span class="kt-post-card__bottom-description kt-text-truncate" title="Example_3">Example_4</span>
</div>
</div>
'''

根據我附上的圖片，我想提取所有“kt-post-card__body”屬性，然后從它們中提取：

("kt-post-card__title", "kt-post-card__description")

像一個列表。

我試過這個：

ads = soup.find_all('div',{'class':'kt-post-card__body'})

但是使用ads[0].div我只能訪問"kt-post-card__title" ，而"kt-post-card__body"還有其他三個子標簽，例如： "kt-post-card__description"和"kt-post-card__bottom" 。 .. ，這是為什么？

Answer 1

嘗試這個：

ads = soup.find_all('div',{'class':'kt-post-card__body'})

ads[0]

我認為你只得到第一個 div 因為你調用ads[0].div

Answer 2

因為你的問題不是很清楚 - 提取類：

for e in soup.select('.kt-post-card__body'):
    print([c for t in e.find_all() for c in t.get('class')])

輸出：

['kt-post-card__title', 'kt-post-card__description', 'kt-post-card__bottom', 'kt-post-card__bottom-description', 'kt-text-truncate']

要獲取文本，您還必須迭代您的ResultSet並可以訪問每個元素文本以填充您的列表或使用stripped_strings 。

例子

from bs4 import BeautifulSoup

html_doc='''
<div class="kt-post-card__body">
<div class="kt-post-card__title">Example_1</div>
<div class="kt-post-card__description">Example_2</div>
<div class="kt-post-card__bottom">
<span class="kt-post-card__bottom-description kt-text-truncate" title="Example_3">Example_4</span>
</div>
</div>
'''

soup = BeautifulSoup(html_doc)

for e in soup.select('.kt-post-card__body'):
    data = [
        e.select_one('.kt-post-card__title').text,
        e.select_one('.kt-post-card__description').text      
    ]
    print(data)

輸出：

['Example_1', 'Example_2']

或者

print(list(e.stripped_strings))

輸出：

['Example_1', 'Example_2', 'Example_4']

使用 BeautifulSoup 從湯中提取標簽

問題描述

2 個解決方案

解決方案1
0 2022-05-20 12:27:04

解決方案2
0 已采納 2022-05-20 12:28:12

例子

使用 BeautifulSoup 從湯中提取標簽

問題描述

2 個解決方案

解決方案1 0 2022-05-20 12:27:04

解決方案2 0 已采納 2022-05-20 12:28:12

例子

解決方案1
0 2022-05-20 12:27:04

解決方案2
0 已采納 2022-05-20 12:28:12