BeautifulSoup不提取特定標簽文本

Question

我在使用BeautifulSoup收集特定標簽的信息時遇到問題。 我想在標記html之間提取“項目4”的文本，但是下面的代碼獲取與“項目1”相關的文本。 我做錯了什么（例如切片）？

碼：

primary_detail = page_section.findAll('div', {'class': 'detail-item'})
for item_4 in page_section.find('h3', string='Item 4'):
  if item_4:
    for item_4_content in page_section.find('html'):
      print (item_4_content)

HTML：

<div class="detail-item">
   <h3>Item 1</h3>
   <html><body><p>Item 1 text here</p></body></html>
</div>

<div class="detail-item">
   <h3>Item 2</h3>
   <html><body><p>Item 2 text here</p></body></html>
</div>

<div class="detail-item">
   <h3>Item 3</h3>
   <html><body><p>Item 3 text here</p></body></html>
</div>

<div class="detail-item">
   <h3>Item 4</h3>
   <html><body><p>Item 4 text here</p></body></html>
</div>

Answer 1

看來您想根據<h3>文本值打印<p>標記內容，對嗎？

您的代碼必須：

加載html_source
搜索包含等於'detail-item'的'class'所有'div'標簽
對於每次出現，如果<h3>標記的.text值等於字符串'Item 4'
然后代碼將print相應<p>標簽的.text值

您可以使用以下代碼來實現所需的功能。

碼：

s = '''<div class="detail-item">
   <h3>Item 1</h3>
   <html><body><p>Item 1 text here</p></body></html>
</div>

<div class="detail-item">
   <h3>Item 2</h3>
   <html><body><p>Item 2 text here</p></body></html>
</div>

<div class="detail-item">
   <h3>Item 3</h3>
   <html><body><p>Item 3 text here</p></body></html>
</div>

<div class="detail-item">
   <h3>Item 4</h3>
   <html><body><p>Item 4 text here</p></body></html>
</div>'''

from bs4 import BeautifulSoup

soup = BeautifulSoup(s, 'lxml')

primary_detail = soup.find_all('div', {'class': 'detail-item'})

for tag in primary_detail:
    if 'Item 4' in tag.h3.text:
        print(tag.p.text)

輸出：

'Item 4 text here'

編輯：在提供的網站中，第一個循環出現沒有<h3>標簽，只有<h2>所以它沒有任何.text值，對嗎？

您可以使用try/except子句來繞過此錯誤，如以下代碼所示。

碼：

from bs4 import BeautifulSoup
import requests


url = 'https://fortiguard.com/psirt/FG-IR-17-097'
html_source = requests.get(url).text

soup = BeautifulSoup(html_source, 'lxml')

primary_detail = soup.find_all('div', {'class': 'detail-item'})

for tag in primary_detail:
    try:
        if 'Solutions' in tag.h3.text:
            print(tag.p.text)
    except:
        continue

如果代碼遇到異常，它將繼續循環中的下一個元素。 因此，代碼將忽略不包含任何.text值的第一項。

輸出：

'Upgrade to FortiWLC-SD version 8.3.0'

BeautifulSoup不提取特定標簽文本

問題描述

1 個解決方案

解決方案1
2 已采納 2017-04-24 16:40:10

BeautifulSoup不提取特定標簽文本

問題描述

1 個解決方案

解決方案1 2 已采納 2017-04-24 16:40:10

解決方案1
2 已采納 2017-04-24 16:40:10