![](/img/trans.png)
[英]BeautifulSoup4: Fail to find 'a' tag with specific href value by find()
[英]Dynamically find href tag
我试图从我的美丽汤搜索中提取“信息技术”作为输出。 但我还不能弄清楚,因为“扇区”是 URL 中任何类型的股票代码的动态值。
谁能告诉我如何提取这些信息?
<a href="http://eresearch.fidelity.com/eresearch/markets_sectors/sectors/sectors_in_market.jhtml?tab=learn&sector=45">Information Technology</a>
我的代码:
url = 'https://eresearch.fidelity.com/eresearch/goto/evaluate/snapshot.jhtml?symbols=AAPL'
html = requests.get(url).text
detail_tags_sector = BeautifulSoup(html, 'lxml')
detail_tags_sector.find_all('a')
要从锚元素获取文本,您需要访问每个锚元素上的 .text 变量
因此,您的代码将更改为:
url = 'https://eresearch.fidelity.com/eresearch/goto/evaluate/snapshot.jhtml?symbols=AAPL'
contents = []
html = requests.get(url).text
detail_tags_sector = BeautifulSoup(html, 'html.paser')
for anchor in detail_tags_sector.find_all('a'):
contents.append(anchor.text)
print(contents)
您可以使用以下任一选项。
import requests
from lxml.html.soupparser import fromstring
url = 'https://eresearch.fidelity.com/eresearch/goto/evaluate/snapshot.jhtml?symbols=AAPL'
html = requests.get(url).text
soup=fromstring(html)
findSearch = soup.xpath('//a[contains(text(), "Information Technology")]/text()')
print(findSearch[0])
或者
from bs4 import BeautifulSoup
from lxml import html
import requests
url = 'https://eresearch.fidelity.com/eresearch/goto/evaluate/snapshot.jhtml?symbols=AAPL'
html = requests.get(url).text
detail_tags_sector = BeautifulSoup(html, 'lxml')
for link in detail_tags_sector.find_all('a'):
print(link.text)
或者
from bs4 import BeautifulSoup
import requests
url = 'https://eresearch.fidelity.com/eresearch/goto/evaluate/snapshot.jhtml?symbols=AAPL'
html = requests.get(url).text
soup = BeautifulSoup(html, 'html.parser')
for link in soup.find_all('a'):
print(link.text)
请让我知道这可不可以帮你。
这些答案的问题在于,它们收集了页面上所有链接的文本,而且数量不少。 如果只想挑选出information technology
字符串,您需要做的就是添加:
info = soup.select_one('[href*="sectors_in"]')
print(info.text)
输出:
Information Technology
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.