[英]How do I get all 1000 results using the GitHub Search API?
I understand that the GitHub Search API limits to 1000 results and 100 results per page.我了解 GitHub Search API 限制为 1000 个结果,每页 100 个结果。 Therefore I wrote the following to view all 1000 results for a code search process that looks for a string
torch
-因此,我编写了以下代码来查看查找字符串
torch
的代码搜索过程的所有 1000 个结果 -
import requests
for i in range(1,11):
url = "https://api.github.com/search/code?q=torch +in:file + language:python&per_page=100&page="+str(i)
headers = {
'Authorization': 'xxxxxxxx'
}
response = requests.request("GET", url, headers=headers).json()
try:
print(len(response['items']))
except:
print("response = ", response)
Here is the output -这是输出 -
15
62
response = {'documentation_url': 'https://docs.github.com/en/free-pro-team@latest/rest/overview/resources-in-the-rest-api#secondary-rate-limits', 'message': 'You have exceeded a secondary rate limit. Please wait a few minutes before you try again.'}
Does there exist an efficient way to get all 1000 results from the Search API?是否存在从搜索 API 获取所有 1000 个结果的有效方法?
There's two things happening here:这里发生了两件事:
The search API has different rate limits.搜索 API 有不同的速率限制。 See the GitHub Documentation :
请参阅GitHub 文档:
The REST API for searching items has a custom rate limit that is separate from the rate limit governing the other REST API endpoints.
用于搜索项目的 REST API 具有自定义速率限制,该速率限制独立于管理其他 REST API 端点的速率限制。
I would recommend trying lower amounts of results per page to solve the incomplete results.我建议尝试每页使用较少数量的结果来解决不完整的结果。
You will also need to be very deliberate about the requests you're making, because the limits are low.您还需要非常慎重地考虑您提出的要求,因为限制很低。 Getting the full 1000 may be impossible without requesting a rate increase or a implementing a very long backoff.
如果不请求提高速率或实施很长的退避期,可能无法获得全部 1000。
I modified your code to add a primitive exponential backoff, but this still doesn't produce the full 1000 results and takes a while:我修改了您的代码以添加原始指数退避,但这仍然不会产生完整的 1000 个结果并且需要一段时间:
import requests
import time
headers = {
'Authorization': 'token <TOKEN>'
}
results = []
for i in range(1, 31):
url = "https://api.github.com/search/code?q=torch +in:file + language:python&per_page=33&page="+str(i)
backoff = 2 # backoff in seconds
while backoff < 1024:
time.sleep(backoff)
try:
response = requests.request("GET", url, headers=headers)
response.raise_for_status() # throw an exception for HTTP 400 and 500s
data = response.json()
results.append(data['items'])
print(f'Got {len(data["items"])} results for page {i}.')
url = response.links['next']['url']
break
except requests.exceptions.RequestException as e:
print('ERROR: Failed to make request: ', e)
backoff **= 2
if backoff >= 1024:
print('ERROR: Backoff limit reached.')
break
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.