[英]Why does this python script work on my local machine but not on Heroku?
there.那里。 I'm building a simple scraping tool.
我正在构建一个简单的抓取工具。 Here's the code that I have for it.
这是我的代码。
from bs4 import BeautifulSoup
import requests
from lxml import html
import gspread
from oauth2client.service_account import ServiceAccountCredentials
import datetime
scope = ['https://spreadsheets.google.com/feeds']
credentials = ServiceAccountCredentials.from_json_keyfile_name('Programming
4 Marketers-File-goes-here.json', scope)
site = 'http://nathanbarry.com/authority/'
hdr = {'User-Agent':'Mozilla/5.0'}
req = requests.get(site, headers=hdr)
soup = BeautifulSoup(req.content)
def getFullPrice(soup):
divs = soup.find_all('div', id='complete-package')
price = ""
for i in divs:
price = i.a
completePrice = (str(price).split('$',1)[1]).split('<', 1)[0]
return completePrice
def getVideoPrice(soup):
divs = soup.find_all('div', id='video-package')
price = ""
for i in divs:
price = i.a
videoPrice = (str(price).split('$',1)[1]).split('<', 1)[0]
return videoPrice
fullPrice = getFullPrice(soup)
videoPrice = getVideoPrice(soup)
date = datetime.date.today()
gc = gspread.authorize(credentials)
wks = gc.open("Authority Tracking").sheet1
row = len(wks.col_values(1))+1
wks.update_cell(row, 1, date)
wks.update_cell(row, 2, fullPrice)
wks.update_cell(row, 3, videoPrice)
This script runs on my local machine.这个脚本在我的本地机器上运行。 But, when I deploy it as a part of an app to Heroku and try to run it, I get the following error:
但是,当我将它作为应用程序的一部分部署到 Heroku 并尝试运行它时,我收到以下错误:
Traceback (most recent call last): File "/app/.heroku/python/lib/python3.6/site-packages/gspread/client.py", line 219, in put_feed r = self.session.put(url, data, headers=headers) File "/app/.heroku/python/lib/python3.6/site-packages/gspread/httpsession.py", line 82, in put return self.request('PUT', url, params=params, data=data, **kwargs) File "/app/.heroku/python/lib/python3.6/site-packages/gspread/httpsession.py", line 69, in request response.status_code, response.content)) gspread.exceptions.RequestError: (400, "400: b'Invalid query parameter value for cell_id.'")回溯(最近一次调用):文件“/app/.heroku/python/lib/python3.6/site-packages/gspread/client.py”,第 219 行,在 put_feed r = self.session.put(url, data, headers=headers) 文件 "/app/.heroku/python/lib/python3.6/site-packages/gspread/httpsession.py", line 82, in put return self.request('PUT', url, params =params, data=data, **kwargs) 文件“/app/.heroku/python/lib/python3.6/site-packages/gspread/httpsession.py”,第 69 行,在请求 response.status_code,response.content 中)) gspread.exceptions.RequestError: (400, "400: b'cell_id 的查询参数值无效。'")
During handling of the above exception, another exception occurred:在处理上述异常的过程中,又发生了一个异常:
Traceback (most recent call last): File "AuthorityScraper.py", line 44, in wks.update_cell(row, 1, date) File "/app/.heroku/python/lib/python3.6/site-packages/gspread/models.py", line 517, in update_cell self.client.put_feed(uri, ElementTree.tostring(feed)) File "/app/.heroku/python/lib/python3.6/site-packages/gspread/client.py", line 221, in put_feed if ex[0] == 403: TypeError: 'RequestError' object does not support indexing回溯(最近一次调用):文件“AuthorityScraper.py”,第 44 行,在 wks.update_cell(row, 1, date) 文件“/app/.heroku/python/lib/python3.6/site-packages/gspread /models.py”,第 517 行,在 update_cell self.client.put_feed(uri, ElementTree.tostring(feed)) 文件“/app/.heroku/python/lib/python3.6/site-packages/gspread/client. py", line 221, in put_feed if ex[0] == 403: TypeError: 'RequestError' object does not support indexing
What do you think might be causing this error?您认为可能导致此错误的原因是什么? Do you have any suggestions for how I can fix it?
你对我如何解决它有什么建议吗?
There are a couple of things going on:有几件事情正在发生:
1) The Google Sheets API returned an error: "Invalid query parameter value for cell_id": 1) Google Sheets API 返回错误:“cell_id 的查询参数值无效”:
gspread.exceptions.RequestError: (400, "400: b'Invalid query parameter value for cell_id.'")
gspread.exceptions.RequestError: (400, "400: b'Invalid query parameter value for cell_id.'")
2) A bug in gspread
caused an exception upon receipt of the error: 2)
gspread
一个 bug 在收到错误时导致异常:
TypeError: 'RequestError' object does not support indexing
TypeError: 'RequestError' 对象不支持索引
Python 3 removed __getitem__
from BaseException
, which this gspread
error handling relies on. Python 3 从
BaseException
删除了__getitem__
,这是gspread
错误处理所依赖的。 This doesn't matter too much because it would have raised an UpdateCellError
exception anyways.这并不重要,因为无论如何它都会引发
UpdateCellError
异常。
My guess is that you are passing an invalid row number to update_cell
.我的猜测是您将无效的行号传递给
update_cell
。 It would be helpful to add some debug logging to your script to show, for example, which row it is trying to update.将一些调试日志添加到您的脚本中以显示例如它正在尝试更新的行会很有帮助。
It may be better to start with a worksheet with zero rows and use append_row
instead.从零行的工作表开始并改用
append_row
可能会更好。 However there does seem to be an outstanding issue in gspread
with append_row
, and it may actually be the same issue you are running into.但是,在
gspread
使用append_row
似乎确实存在一个悬而未决的问题,它实际上可能与您遇到的问题相同。
I encountered the same problem.我遇到了同样的问题。 BS4 works fine at a local machine.
BS4 在本地机器上运行良好。 However, for some reason, it is way too slow in the Heroku server resulting into giving error.
但是,出于某种原因,Heroku 服务器中的速度太慢,导致出现错误。
I switched to lxml and it is working fine now.我切换到 lxml,现在工作正常。
Install it by command:通过命令安装它:
pip install lxml
A sample code snippet is given below:下面给出了一个示例代码片段:
from lxml import html
import requests
getpage = requests.get("https://url_here")
gethtmlcontent = html.fromstring(getpage.content)
data = gethtmlcontent.xpath('//div[@class = "class-name"]/text()')
#this is a sample for fetching data from the dummy div
data = data[0:n] # as per your requirement
#now inject the data into django tmeplate.
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.