简体   繁体   English

如何使用 GitHub 搜索 API 获取全部 1000 个结果?

[英]How do I get all 1000 results using the GitHub Search API?

I understand that the GitHub Search API limits to 1000 results and 100 results per page.我了解 GitHub Search API 限制为 1000 个结果,每页 100 个结果。 Therefore I wrote the following to view all 1000 results for a code search process that looks for a string torch -因此,我编写了以下代码来查看查找字符串torch的代码搜索过程的所有 1000 个结果 -

import requests
for i in range(1,11):
    url = "https://api.github.com/search/code?q=torch +in:file + language:python&per_page=100&page="+str(i)

    headers = {
    'Authorization': 'xxxxxxxx'
    }

    response = requests.request("GET", url, headers=headers).json()
    try:
        print(len(response['items']))
    except:
        print("response = ", response)

Here is the output -这是输出 -

15
62
response =  {'documentation_url': 'https://docs.github.com/en/free-pro-team@latest/rest/overview/resources-in-the-rest-api#secondary-rate-limits', 'message': 'You have exceeded a secondary rate limit. Please wait a few minutes before you try again.'}
  1. It seems to hit the secondary rate limit just after the second iteration似乎在第二次迭代后就达到了二次速率限制
  2. The values in the pages aren't consistent.页面中的值不一致。 For instance, page 1 shows 15 results when I ran this time.比如我这次跑的时候第1页显示了15个结果。 However, if I run it again, it will be another number.但是,如果我再次运行它,它将是另一个数字。 I believe there should be 100 results per page.我相信每页应该有 100 个结果。

Does there exist an efficient way to get all 1000 results from the Search API?是否存在从搜索 API 获取所有 1000 个结果的有效方法?

There's two things happening here:这里发生了两件事:

  1. You are receiving incomplete results because the query is timing out.您收到的结果不完整,因为查询超时。
  2. You are being rate limited.您受到速率限制。

The search API has different rate limits.搜索 API 有不同的速率限制。 See the GitHub Documentation :请参阅GitHub 文档

The REST API for searching items has a custom rate limit that is separate from the rate limit governing the other REST API endpoints.用于搜索项目的 REST API 具有自定义速率限制,该速率限制独立于管理其他 REST API 端点的速率限制。

I would recommend trying lower amounts of results per page to solve the incomplete results.我建议尝试每页使用较少数量的结果来解决不完整的结果。

You will also need to be very deliberate about the requests you're making, because the limits are low.您还需要非常慎重地考虑您提出的要求,因为限制很低。 Getting the full 1000 may be impossible without requesting a rate increase or a implementing a very long backoff.如果不请求提高速率或实施很长的退避期,可能无法获得全部 1000。

I modified your code to add a primitive exponential backoff, but this still doesn't produce the full 1000 results and takes a while:我修改了您的代码以添加原始指数退避,但这仍然不会产生完整的 1000 个结果并且需要一段时间:

import requests
import time

headers = {
'Authorization': 'token <TOKEN>'
}

results = []
for i in range(1, 31):
    url = "https://api.github.com/search/code?q=torch +in:file + language:python&per_page=33&page="+str(i)
    backoff = 2 # backoff in seconds
    while backoff < 1024:
        time.sleep(backoff)
        try:
            response = requests.request("GET", url, headers=headers)
            response.raise_for_status() # throw an exception for HTTP 400 and 500s
            data = response.json()
            results.append(data['items'])
            print(f'Got {len(data["items"])} results for page {i}.')
            url = response.links['next']['url']
            break
        except requests.exceptions.RequestException as e:
            print('ERROR: Failed to make request: ', e)
            backoff **= 2
    if backoff >= 1024:
        print('ERROR: Backoff limit reached.')
        break

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 在Odoo 8 ORM api中,如何使用search()以相反的顺序获得结果? - In Odoo 8 ORM api , how to get results in reverse order using search()? 使用 Github API 获取所有成员与 Python - Using Github API to Get All Members with Python 如何使用Scrapy获取亚马逊搜索的所有结果? - How can I grab all the results of an amazon search using Scrapy? 如何使用 Scopus API 和 Python 搜索特定期刊 ISSN 的所有作者、隶属关系和引文? - How do I search for all authors, affiliations, and citations for a specific journal ISSN using Scopus API and Python? 如何使用一个 API 调用从谷歌电子表格中的所有工作表(选项卡)中获取所有记录? - How do I get all records from all sheets (tabs) in a google spreadsheet using one API call? 如何在搜索模式下使用 Foursquare API 获得超过 50 个场地? - How do I get more than 50 venues using the Foursquare API in search mode? 如何获得Google自定义搜索API的开发人员ID? - How do I get a developer id for google custom search api? 如何排除 Python 中的搜索结果 - How do I exclude search results in Python 如何获取我的用户使用github API打开的所有打开请求请求? - How to I get all open pull requests opened by my user with github api? 如何使用币安 API 获取所有价格历史记录,以使用 Python 进行加密? - How do I get all the prices history with binance API for a crypto using Python?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM