![](/img/trans.png)
[英]Scrape websites with infinite scrolling using selenium and beautifulsoup return repeated elements
[英]scrape websites using BeautifulSoup
我在抓取時遇到屬性錯誤
import urllib2
from bs4 import BeautifulSoup
quote_page ='https://www.bloomberg.com/quote/SPX:IND'
page = urllib2.urlopen(quote_page)
soup = BeautifulSoup(page,'html.parser')
name_box = soup.find('h1', attires ={'class': 'name'})
name = name_box.text.strip()
print name
追溯(最近一次通話):
在第11行的文件“ word1.py”中
name = name_box.text.strip()
AttributeError:“ NoneType”對象沒有屬性“ text”
Viveks-MacBook-Pro:py vivek $
當你這樣做
print(name_box)
你會得到
None
Traceback (most recent call last):
File "C:/Users/devsurya/python/demo programs/b4s.py", line 13, in <module>
name = name_box.text.strip()
AttributeError: 'NoneType' object has no attribute 'text'
當您這樣做時-
print(soup) ## it says following message with weird html and css
我們檢測到您計算機網絡中的異常活動
和soup.find('h1', attires ={'class': 'name'})
應該是soup.find('h1', {'class': 'companyName__99a4824b'})
假設您想要公司名稱,我將隨請求一起使用,並且需要幾個標頭(您將需要進行測試,以查看其是否隨着時間的推移始終保持一致)。 我使用css attribute = value選擇器來獲取適當的元素,並使用以運算符^開頭的情況(如果值是動態的),即我假設companyName
起始字符串為常數。 這使其對於其他請求更具通用性。
import requests
from bs4 import BeautifulSoup as bs
quote_page ='https://www.bloomberg.com/quote/SPX:IND'
page = requests.get(quote_page, headers = {'User-Agent':'Mozilla/5.0', 'accept-language':'en-US,en;q=0.9'})
soup = bs(page.content,'lxml')
name_box = soup.select_one('[class^=companyName]')
name = name_box.text.strip()
print(name)
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.