使用BeautifulSoup抓取網站

Question

我在抓取時遇到屬性錯誤

import urllib2
from bs4 import BeautifulSoup

quote_page ='https://www.bloomberg.com/quote/SPX:IND'
page = urllib2.urlopen(quote_page)

soup = BeautifulSoup(page,'html.parser')

name_box = soup.find('h1', attires ={'class': 'name'})

name = name_box.text.strip()
print name

追溯（最近一次通話）：

在第11行的文件“ word1.py”中
 name = name_box.text.strip() 
AttributeError：“ NoneType”對象沒有屬性“ text”

Viveks-MacBook-Pro：py vivek $

Answer 1

當你這樣做

print(name_box)

你會得到

 None
Traceback (most recent call last):
  File "C:/Users/devsurya/python/demo programs/b4s.py", line 13, in <module>
    name = name_box.text.strip()
AttributeError: 'NoneType' object has no attribute 'text'

當您這樣做時-

print(soup)    ## it says following message with weird html and css

我們檢測到您計算機網絡中的異常活動

和soup.find('h1', attires ={'class': 'name'})應該是soup.find('h1', {'class': 'companyName__99a4824b'})

Answer 2

假設您想要公司名稱，我將隨請求一起使用，並且需要幾個標頭（您將需要進行測試，以查看其是否隨着時間的推移始終保持一致）。 我使用css attribute = value選擇器來獲取適當的元素，並使用以運算符^開頭的情況（如果值是動態的），即我假設companyName起始字符串為常數。 這使其對於其他請求更具通用性。

import requests
from bs4 import BeautifulSoup as bs

quote_page ='https://www.bloomberg.com/quote/SPX:IND'
page = requests.get(quote_page, headers = {'User-Agent':'Mozilla/5.0', 'accept-language':'en-US,en;q=0.9'})
soup = bs(page.content,'lxml')
name_box = soup.select_one('[class^=companyName]')
name = name_box.text.strip()
print(name)

使用BeautifulSoup抓取網站

問題描述

2 個解決方案

解決方案1
1 2019-08-07 20:33:43

解決方案2
0 2019-08-07 20:35:53

使用BeautifulSoup抓取網站

問題描述

2 個解決方案

解決方案1 1 2019-08-07 20:33:43

解決方案2 0 2019-08-07 20:35:53

解決方案1
1 2019-08-07 20:33:43

解決方案2
0 2019-08-07 20:35:53