[英]Error “AttributeError: 'NoneType' object has no attribute 'append'”
[英]AttributeError: 'NoneType' object has no attribute 'html' Error
我正在嘗試抽象我的抓取工具,以便我可以將它擴展到項目的其他 web 頁面,但是當我運行我的程序時,我得到了這個錯誤:
from requests_html import HTMLSession
from abc import ABC, abstractmethod
from bs4 import BeautifulSoup as bs
class AbstractClass(ABC):
@abstractmethod
def __init__(self,url):
self.url = url
@abstractmethod
def getSession(self):
self.session = HTMLSession()
self.url_ = self.session.get(self.url)
self.url_ = self.url_.html.render(timeout=20)
self.soup = bs(self.url_.html.html, 'lxml')
print(self.soup.prettify())#To proove
class Santander(AbstractClass):
def __init__(self, url):
super().__init__(url)
def getSession(self):
super().getSession()
santander = Santander('https://banco.santander.cl/beneficios')
santander.getSession()
我收到此錯誤,我認為這是由於對庫的錯誤使用(我為頁面可能具有的 JS 使用“from requests_html import HTMLSession”)並且我嘗試移動和更改某些內容,但它一直失敗。
Traceback (most recent call last):
File "c:\Users\Felipe\Documents\Scrapper\scraper.py", line 41, in <module>
santander.getSession()
File "c:\Users\Felipe\Documents\Scrapper\scraper.py", line 38, in getSession
return super().getSession()
File "c:\Users\Felipe\Documents\Scrapper\scraper.py", line 15, in getSession
self.soup = bs(self.url_.html.html, 'lxml')
AttributeError: 'NoneType' object has no attribute 'html'
這是我的初始代碼,在我想開始抽象它之前,它工作正常。
from bs4 import BeautifulSoup as bs
from requests_html import HTMLSession
session = HTMLSession()
url_ = 'https://banco.santander.cl/beneficios'
url= session.get(url_)
url.html.render(timeout=20)
soup = bs(url.html.html, 'lxml')
#print(soup.prettify())
page_santander = soup.find("section", id="section-promotions")
container = page_santander.find("div", class_="container")
grid = container.find_all("div", class_="row mini")[0].find_all("div",class_="d-block h-100 cursor-pointer")
#print(len(grid))
for i in range(0, len(grid)):
title = grid[i].find("h2").get_text()
summary = grid[i].find("p").get_text()
#discountUrl = grid[i].find("a").get('href')
print(title)
print(summary)
嘗試改變:
self.url_ = self.session.get(self.url)
self.url_ = self.url_.html.render(timeout=20)
self.soup = bs(self.url_.html.html, 'lxml')
進入:
current_url = self.session.get(self.url)
current_url.html.render(timeout=20)
self.soup = bs(current_url.html.html, 'lxml')
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.