简体   繁体   English

使用 beautifulsoup 从 span 中提取元素

[英]Extracting element from span with beautifulsoup

I am trying to extract the element that are circled in red in the picture below from this website: https://eresearch.fidelity.com/eresearch/markets_sectors/sectors/sectors_in_market.jhtml?tab=learn&sector=25我正在尝试从该网站提取下图中用红色圈出的元素: https://eresearch.fidelity.com/eresearch/markets_sectors/sectors/sectors_in_market.jhtml?tab=learn&sector=25

在此处输入图像描述

However, it keeps giving me this error "ResultSet object has no attribute 'find'. You're probably treating a list of elements like a single element. Did you call find_all() when you meant to call find()?"但是,它一直给我这个错误“ResultSet object 没有属性‘find’。您可能将元素列表视为单个元素。当您打算调用 find() 时是否调用了 find_all()?”

My idea is to narrow down the search to the "td" tag and use find to get the element from the "span" tag, but I just can't get it to work.我的想法是将搜索范围缩小到“td”标签并使用 find 从“span”标签中获取元素,但我就是无法让它工作。 I tried using both find() and find_all() but both keep giving me this error, below is my code:我尝试同时使用 find() 和 find_all() 但都一直给我这个错误,下面是我的代码:

from bs4 import BeautifulSoup
url = "https://eresearch.fidelity.com/eresearch/markets_sectors/sectors/sectors_in_market.jhtml? 
tab=learn&sector=25"
response = requests.get(url)
soup = BeautifulSoup(response.content,'lxml')
b = soup.select('div.left-content table.snapshot-data-tbl tr td') 
print(b.find("span", class_='positive').text) 

Can I please get some help on this?我可以得到一些帮助吗? Thanks!谢谢!

soup.select returns a container with the matches from the provided CSS selector. soup.select返回一个容器,其中包含来自提供的 CSS 选择器的匹配项。 The soup.find method can only be applied to an element node from the page source, not the set of results from soup.select . soup.find方法只能应用于页面源中的元素节点,而不是来自soup.select的结果集。 Instead, you can iterate over the results of soup.select to get the text from the target span s:相反,您可以迭代soup.select的结果以从目标span获取文本:

r = [i.get_text(strip=True) for i in soup.select('div.left-content table.snapshot-data-tbl tr td > span:nth-of-type(1)')[:3]]

Output: Output:

['+0.62%', '$8.82T', '12.32%']

Typically it's cleaner to find the element by class or id if possible over the longer CSS selector.通常,如果可能的话,通过 class 或 id 在较长的 CSS 选择器上查找元素会更清晰。 In this case the data you want is inside the table with class=" snapshot-data-tbl " within the first 3 td elements.在这种情况下,您想要的数据位于表中,前 3 个 td 元素中包含 class=" snapshot-data-tbl "。

Here's another solution to extract the values:这是提取值的另一种解决方案:

from bs4 import BeautifulSoup
import requests

url = "https://eresearch.fidelity.com/eresearch/markets_sectors/sectors/sectors_in_market.jhtml?tab=learn&sector=25"
response = requests.get(url)
soup = BeautifulSoup(response.content, "html.parser")
table = soup.find('table', class_="snapshot-data-tbl")
r = [td.find('span').text for td in table.find_all('td')[:3]]
print(r)

Output: Output:

['+0.62%', '$8.82T', '12.32%']

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM