Python HTTP 错误 429（请求过多）

Question

I used to fetch a CSV file from a URL and put that CSV file directly to a Pandas dataframe like this:我曾经从 URL 获取 CSV 文件，然后将该 CSV 文件直接放入 Pandas 数据框，如下所示：

import pandas as pd

grab_csv = 'https://XXXX.XX/data.csv'
pd_data = pd.read_csv(grab_csv).drop(columns=['Column 1', 'Column 2', 'Column 3', 'Column 4', 'Column 4', 'Column 5', 'Column 6', 'Column 7'])

Since today, I get urllib.error.HTTPError: HTTP Error 429: Too Many Requests .从今天开始，我收到urllib.error.HTTPError: HTTP Error 429: Too Many Requests 。 What I tried in order to fix it:我试图修复它：

import pandas as pd
import requests
from io import StringIO

grab_csv = 'https://XXXX.XX/data.csv'

headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 6.2; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/75.0.3770.100 Safari/537.36'}
        
res_grab_data = requests.get(StringIO(grab_csv), headers=headers).text

pd_data = pd.read_csv(res_grab_data).drop(columns=['Column 1', 'Column 2', 'Column 3', 'Column 4', 'Column 4', 'Column 5', 'Column 6', 'Column 7'])

This time, I get the error requests.exceptions.MissingSchema: Invalid URL '<_io.StringIO object at 0x0000012B7C622A20>': No schema supplied. Perhaps you meant http://<_io.StringIO object at 0x0000012B7C622A20>?这次，我收到错误requests.exceptions.MissingSchema: Invalid URL '<_io.StringIO object at 0x0000012B7C622A20>': No schema supplied. Perhaps you meant http://<_io.StringIO object at 0x0000012B7C622A20>? requests.exceptions.MissingSchema: Invalid URL '<_io.StringIO object at 0x0000012B7C622A20>': No schema supplied. Perhaps you meant http://<_io.StringIO object at 0x0000012B7C622A20>? . .

Any idea how I can solve the HTTP Error 429 with pandas and requests?知道如何使用熊猫和请求解决 HTTP 错误 429 吗？

Answer 1

The error is being thrown by the web server that you are making the requests to, almost certainly because you're issuing requests too quickly and they don't like it.错误是由您向其发出请求的 Web 服务器抛出的，几乎可以肯定是因为您发出请求的速度太快而他们不喜欢它。 It's not because of an error in your code.这不是因为您的代码中有错误。

Your attempt at fixing it doesn't make much sense -- StringIO allows you to use an in-memory string as if it were a file object.您修复它的尝试没有多大意义- StringIO允许您将内存中的字符串用作文件对象。 Passing it as a parameter to requests.get isn't really a valid use case -- you should be using requests.get(grab_csv, ... as you were previously, as .get() expects the url parameter to be a string.将它作为参数传递给requests.get并不是一个真正有效的用例——你应该使用requests.get(grab_csv, ...就像你以前一样，因为.get()期望url参数是一个字符串.

I'd consult the documentation for the API your using (if there is any), and slow down your rate of requests to be in line with their limits.我会查阅您使用的 API 的文档（如果有的话），并减慢您的请求速度以符合其限制。

There is a neat Python package (aptly named ratelimit ) that lets you decorate your function to enforce the rate limiting: https://pypi.org/project/ratelimit/有一个简洁的 Python 包（恰当地命名为ratelimit ），可让您装饰函数以强制执行速率限制： https : //pypi.org/project/ratelimit/

Python HTTP 错误 429（请求过多）

问题描述

1 个解决方案

解决方案1
1 已采纳 2020-11-03 17:06:22

Python HTTP 错误 429（请求过多）

问题描述

1 个解决方案

解决方案1 1 已采纳 2020-11-03 17:06:22

解决方案1
1 已采纳 2020-11-03 17:06:22