[英]BeautifulSoup not extracting specific tag text
I'm having a problem harvesting the information for a specific tag using BeautifulSoup. 我在使用BeautifulSoup收集特定标签的信息时遇到问题。 I would like to extract the text for 'Item 4' between the tag html, but the code below gets the text related to 'Item 1.' 我想在标记html之间提取“项目4”的文本,但是下面的代码获取与“项目1”相关的文本。 What am I doing incorrect(eg, slicing)? 我做错了什么(例如切片)?
Code: 码:
primary_detail = page_section.findAll('div', {'class': 'detail-item'})
for item_4 in page_section.find('h3', string='Item 4'):
if item_4:
for item_4_content in page_section.find('html'):
print (item_4_content)
HTML: HTML:
<div class="detail-item">
<h3>Item 1</h3>
<html><body><p>Item 1 text here</p></body></html>
</div>
<div class="detail-item">
<h3>Item 2</h3>
<html><body><p>Item 2 text here</p></body></html>
</div>
<div class="detail-item">
<h3>Item 3</h3>
<html><body><p>Item 3 text here</p></body></html>
</div>
<div class="detail-item">
<h3>Item 4</h3>
<html><body><p>Item 4 text here</p></body></html>
</div>
It looks like you want to print the <p>
tag content according to <h3>
text value, correct? 看来您想根据<h3>
文本值打印<p>
标记内容,对吗?
Your code must: 您的代码必须:
html_source
加载html_source
'div'
tags that contains a 'class'
equal to 'detail-item'
搜索包含等于'detail-item'
的'class'
所有'div'
标签 .text
value of <h3>
tag is equal to the string 'Item 4'
对于每次出现,如果<h3>
标记的.text
值等于字符串'Item 4'
print
the .text
value of the corresponding <p>
tag 然后代码将print
相应<p>
标签的.text
值 You can achieve what you want by using the following code. 您可以使用以下代码来实现所需的功能。
Code: 码:
s = '''<div class="detail-item">
<h3>Item 1</h3>
<html><body><p>Item 1 text here</p></body></html>
</div>
<div class="detail-item">
<h3>Item 2</h3>
<html><body><p>Item 2 text here</p></body></html>
</div>
<div class="detail-item">
<h3>Item 3</h3>
<html><body><p>Item 3 text here</p></body></html>
</div>
<div class="detail-item">
<h3>Item 4</h3>
<html><body><p>Item 4 text here</p></body></html>
</div>'''
from bs4 import BeautifulSoup
soup = BeautifulSoup(s, 'lxml')
primary_detail = soup.find_all('div', {'class': 'detail-item'})
for tag in primary_detail:
if 'Item 4' in tag.h3.text:
print(tag.p.text)
Output: 输出:
'Item 4 text here'
EDIT: In the provided website the first loop occurence don't have a <h3>
tag, only a <h2>
so it won't have any .text
value, correct? 编辑:在提供的网站中 ,第一个循环出现没有<h3>
标签,只有<h2>
所以它没有任何.text
值,对吗?
You can bypass this error using a try/except
clause, like in the following code.. 您可以使用try/except
子句来绕过此错误,如以下代码所示。
Code: 码:
from bs4 import BeautifulSoup
import requests
url = 'https://fortiguard.com/psirt/FG-IR-17-097'
html_source = requests.get(url).text
soup = BeautifulSoup(html_source, 'lxml')
primary_detail = soup.find_all('div', {'class': 'detail-item'})
for tag in primary_detail:
try:
if 'Solutions' in tag.h3.text:
print(tag.p.text)
except:
continue
If the code faces an exception, it'll continue the iteration with the next element in the loop. 如果代码遇到异常,它将继续循环中的下一个元素。 So the code will ignore the first item that don't contain any .text
value. 因此,代码将忽略不包含任何.text
值的第一项。
Output: 输出:
'Upgrade to FortiWLC-SD version 8.3.0'
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.