为什么这个 python 脚本可以在我的本地机器上运行，而不能在 Heroku 上运行？

Question

there.那里。 I'm building a simple scraping tool.我正在构建一个简单的抓取工具。 Here's the code that I have for it.这是我的代码。

from bs4 import BeautifulSoup
import requests
from lxml import html
import gspread
from oauth2client.service_account import ServiceAccountCredentials
import datetime

scope = ['https://spreadsheets.google.com/feeds']

credentials = ServiceAccountCredentials.from_json_keyfile_name('Programming 
4 Marketers-File-goes-here.json', scope)

site = 'http://nathanbarry.com/authority/'
hdr = {'User-Agent':'Mozilla/5.0'}
req = requests.get(site, headers=hdr)

soup = BeautifulSoup(req.content)

def getFullPrice(soup):
    divs = soup.find_all('div', id='complete-package')
    price = ""
    for i in divs:
        price = i.a
    completePrice = (str(price).split('$',1)[1]).split('<', 1)[0]
    return completePrice


def getVideoPrice(soup):
    divs = soup.find_all('div', id='video-package')
    price = ""
    for i in divs:
        price = i.a
    videoPrice = (str(price).split('$',1)[1]).split('<', 1)[0]
    return videoPrice

fullPrice = getFullPrice(soup)
videoPrice = getVideoPrice(soup)
date = datetime.date.today()

gc = gspread.authorize(credentials)
wks = gc.open("Authority Tracking").sheet1

row = len(wks.col_values(1))+1

wks.update_cell(row, 1, date)
wks.update_cell(row, 2, fullPrice)
wks.update_cell(row, 3, videoPrice)

This script runs on my local machine.这个脚本在我的本地机器上运行。 But, when I deploy it as a part of an app to Heroku and try to run it, I get the following error:但是，当我将它作为应用程序的一部分部署到 Heroku 并尝试运行它时，我收到以下错误：

Traceback (most recent call last): File "/app/.heroku/python/lib/python3.6/site-packages/gspread/client.py", line 219, in put_feed r = self.session.put(url, data, headers=headers) File "/app/.heroku/python/lib/python3.6/site-packages/gspread/httpsession.py", line 82, in put return self.request('PUT', url, params=params, data=data, **kwargs) File "/app/.heroku/python/lib/python3.6/site-packages/gspread/httpsession.py", line 69, in request response.status_code, response.content)) gspread.exceptions.RequestError: (400, "400: b'Invalid query parameter value for cell_id.'")回溯（最近一次调用）：文件“/app/.heroku/python/lib/python3.6/site-packages/gspread/client.py”，第 219 行，在 put_feed r = self.session.put(url, data, headers=headers) 文件 "/app/.heroku/python/lib/python3.6/site-packages/gspread/httpsession.py", line 82, in put return self.request('PUT', url, params =params, data=data, **kwargs) 文件“/app/.heroku/python/lib/python3.6/site-packages/gspread/httpsession.py”，第 69 行，在请求 response.status_code，response.content 中)) gspread.exceptions.RequestError: (400, "400: b'cell_id 的查询参数值无效。'")

During handling of the above exception, another exception occurred:在处理上述异常的过程中，又发生了一个异常：

Traceback (most recent call last): File "AuthorityScraper.py", line 44, in wks.update_cell(row, 1, date) File "/app/.heroku/python/lib/python3.6/site-packages/gspread/models.py", line 517, in update_cell self.client.put_feed(uri, ElementTree.tostring(feed)) File "/app/.heroku/python/lib/python3.6/site-packages/gspread/client.py", line 221, in put_feed if ex[0] == 403: TypeError: 'RequestError' object does not support indexing回溯（最近一次调用）：文件“AuthorityScraper.py”，第 44 行，在 wks.update_cell(row, 1, date) 文件“/app/.heroku/python/lib/python3.6/site-packages/gspread /models.py”，第 517 行，在 update_cell self.client.put_feed(uri, ElementTree.tostring(feed)) 文件“/app/.heroku/python/lib/python3.6/site-packages/gspread/client. py", line 221, in put_feed if ex[0] == 403: TypeError: 'RequestError' object does not support indexing

What do you think might be causing this error?您认为可能导致此错误的原因是什么？ Do you have any suggestions for how I can fix it?你对我如何解决它有什么建议吗？

Answer 1

There are a couple of things going on:有几件事情正在发生：

1) The Google Sheets API returned an error: "Invalid query parameter value for cell_id": 1) Google Sheets API 返回错误：“cell_id 的查询参数值无效”：

gspread.exceptions.RequestError: (400, "400: b'Invalid query parameter value for cell_id.'") gspread.exceptions.RequestError: (400, "400: b'Invalid query parameter value for cell_id.'")

2) A bug in gspread caused an exception upon receipt of the error: 2) gspread一个 bug 在收到错误时导致异常：

TypeError: 'RequestError' object does not support indexing TypeError: 'RequestError' 对象不支持索引

Python 3 removed __getitem__ from BaseException , which this gspread error handling relies on. Python 3 从BaseException删除了__getitem__ ，这是gspread错误处理所依赖的。 This doesn't matter too much because it would have raised an UpdateCellError exception anyways.这并不重要，因为无论如何它都会引发UpdateCellError异常。

My guess is that you are passing an invalid row number to update_cell .我的猜测是您将无效的行号传递给update_cell 。 It would be helpful to add some debug logging to your script to show, for example, which row it is trying to update.将一些调试日志添加到您的脚本中以显示例如它正在尝试更新的行会很有帮助。

It may be better to start with a worksheet with zero rows and use append_row instead.从零行的工作表开始并改用append_row可能会更好。 However there does seem to be an outstanding issue in gspread with append_row , and it may actually be the same issue you are running into.但是，在gspread使用append_row似乎确实存在一个悬而未决的问题，它实际上可能与您遇到的问题相同。

Answer 2

I encountered the same problem.我遇到了同样的问题。 BS4 works fine at a local machine. BS4 在本地机器上运行良好。 However, for some reason, it is way too slow in the Heroku server resulting into giving error.但是，出于某种原因，Heroku 服务器中的速度太慢，导致出现错误。

I switched to lxml and it is working fine now.我切换到 lxml，现在工作正常。

Install it by command:通过命令安装它：

pip install lxml

A sample code snippet is given below:下面给出了一个示例代码片段：

from lxml import html
import requests

getpage = requests.get("https://url_here")
gethtmlcontent = html.fromstring(getpage.content)
data = gethtmlcontent.xpath('//div[@class = "class-name"]/text()') 
#this is a sample for fetching data from the dummy div
data = data[0:n] # as per your requirement

#now inject the data into django tmeplate.

为什么这个 python 脚本可以在我的本地机器上运行，而不能在 Heroku 上运行？

问题描述

2 个解决方案

解决方案1
2 2017-08-28 02:31:35

解决方案2
0 2020-08-09 09:30:10

为什么这个 python 脚本可以在我的本地机器上运行，而不能在 Heroku 上运行？

问题描述

2 个解决方案

解决方案1 2 2017-08-28 02:31:35

解决方案2 0 2020-08-09 09:30:10

解决方案1
2 2017-08-28 02:31:35

解决方案2
0 2020-08-09 09:30:10