简体   繁体   English

美丽的汤返回“[]”

[英]Beautiful Soup returning “[]”

I'm attempting to pull the company information off the bloomberg company profile website using the below code:我正在尝试使用以下代码从彭博公司简介网站上提取公司信息:

import requests
from bs4 import BeautifulSoup

URL = 'https://www.bloomberg.com/profile/company/AAPL:US'

source = requests.get(URL)

soup = BeautifulSoup(source.content, 'lxml')

company_name = soup.findAll('h1', class_= 'companyName__9bd88132')

company_description = soup.findAll('div', class_ = 'description__ce057c5c')

print(company_name)
print(company_description)

But I am only getting two "[ ]" back as a result.但结果我只得到了两个“[]”。 In the responses I've seen to similar questions, they have said its because the incorrect divs are being pulled, but I don't think that is the case here.在我看到的类似问题的回复中,他们说这是因为不正确的 div 被拉出,但我认为情况并非如此。 Would someone know why it isn't working?有人会知道为什么它不起作用吗? Edit: I've attached the section of html I am trying to pull from below:编辑:我附上了 html 的部分,我试图从下面拉出:

<section class="companyProfileOverview__aa874298 up__e13cf193"><section class="info__d075c560"><h1 class="companyName__9bd88132">Apple Inc</h1><div class="description__ce057c5c">Apple Inc. designs, manufactures, and markets personal computers and related personal computing and mobile communication devices along with a variety of related software, services, peripherals, and networking solutions. Apple sells its products worldwide through its online stores, its retail stores, its direct sales force, third-party wholesalers, and resellers.</div></section><section class="currentPriceContainer"><p class="currentPriceLabel__f1524605">CURRENT PRICE</p><div><div class="inlineRow__7728fc34"><span class="tickerText__d2e1ee30">AAPL:US</span><span class="priceText__0feeaba3">343.99</span><span class="currency__bef924de">USD</span></div><span class="triangle__73a7d8b2 up__a3b61807"></span><div class="inlineRow__7728fc34"><span class="priceChange__5e691975">+10.53</span><span class="percentChange__3c14f7c4">+3.16%</span></div><div class="time__245ca7bb "><span>As of 08:00 PM EDT 06/09/2020 </span></div><a class="quoteLink__d3ac120b" href="/quote/AAPL:US">SEE QUOTE</a></div></section><div class="infoTable__96162ad6"><section class="infoTableItem__1003ce53"><h2 class="infoTableItemLabel__c9a5d511">SECTOR</h2><div class="infoTableItemValue__e188b0cb">Technology</div></section><section class="infoTableItem__1003ce53"><h2 class="infoTableItemLabel__c9a5d511">INDUSTRY</h2><div class="infoTableItemValue__e188b0cb">Hardware</div></section><section class="infoTableItem__1003ce53"><h2 class="infoTableItemLabel__c9a5d511">SUB-INDUSTRY</h2><div class="infoTableItemValue__e188b0cb">Communications Equipment</div></section><section class="infoTableItem__1003ce53"><h2 class="infoTableItemLabel__c9a5d511">FOUNDED</h2><div class="infoTableItemValue__e188b0cb">01/03/1977</div></section><section class="infoTableItem__1003ce53"><h2 class="infoTableItemLabel__c9a5d511">ADDRESS</h2><div class="infoTableItemValue__e188b0cb">1 Infinite Loop
Cupertino, CA 95014
United States</div></section><section class="infoTableItem__1003ce53"><h2 class="infoTableItemLabel__c9a5d511">PHONE</h2><div class="infoTableItemValue__e188b0cb">1-408-996-1010</div></section><section class="infoTableItem__1003ce53"><h2 class="infoTableItemLabel__c9a5d511">WEBSITE</h2><div class="infoTableItemValue__e188b0cb">www.apple.com</div></section><section class="infoTableItem__1003ce53"><h2 class="infoTableItemLabel__c9a5d511">NO. OF EMPLOYEES</h2><div class="infoTableItemValue__e188b0cb">100000</div></section></div></section>

I am trying to pull the company name(companyName__9bd88132) and the company description(description__ce057c5c).我正在尝试提取公司名称(companyName__9bd88132)和公司描述(description__ce057c5c)。 Eventually I would like to pull the sector information as well.最终我也想提取部门信息。

Use this code:使用此代码:

import requests
from bs4 import BeautifulSoup

URL = 'https://www.bloomberg.com/profile/company/AAPL:US'
from fake_useragent import UserAgent
headers = {'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/39.0.2171.95 Safari/537.36'}
ua=UserAgent()
hdr = {'User-Agent': ua.random,
      'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8',
      'Accept-Charset': 'ISO-8859-1,utf-8;q=0.7,*;q=0.3',
      'Accept-Encoding': 'none',
      'Accept-Language': 'en-US,en;q=0.8',
      'Connection': 'keep-alive'}
source = requests.get(URL,headers=hdr)

soup = BeautifulSoup(source.content, features="html.parser")
# print(soup)
company_name = soup.find_all('h1', class_= 'companyName__9bd88132')

company_description = soup.find_all('div', class_ = 'description__ce057c5c')

print(company_name)
print(company_description)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM