简体   繁体   English

为什么我的 GitHub 代码搜索达到二级速率限制?

[英]Why is my GitHub code search hitting secondary rate limits?

I am searching for GitHub files containing the string "torch."我正在搜索包含字符串“torch”的 GitHub 个文件。 Since, the search API limits searches to the first 100 results, I am searching based on file sizes as suggested here .由于搜索 API 将搜索限制为前 100 个结果,因此我根据此处建议的文件大小进行搜索。 However, I keep hitting the secondary rate limit.但是,我一直在达到二级速率限制。 Could someone suggest if I am doing something wrong or if there is a way to optimize my code to prevent these rate limits?有人可以建议我是否做错了什么,或者是否有办法优化我的代码以防止这些速率限制? I have already looked at best practices to deal with rate limits.我已经看过处理速率限制的最佳实践 Here is my code -这是我的代码 -

import os
import requests
import httplink
import time

# This for loop searches for code based on files sizes from 0 to 500000 containing the string "torch"
for i in range(0,500000,250):
  print("i = ",i," i + 250 = ", i+250)
  url = "https://api.github.com/search/code?q=torch +in:file + language:python+size:"+str(i)+".."+str(i+250)+"&page=1&per_page=10" 

  headers = {"Authorization": f'Token xxxxxxxxxxxxxxx'} ## Please put your token over here

  # Backoff when secondary rate limit is reached
  backoff = 256

  total = 0
  cond = True

  # This while loop goes over all pages of results => Pagination
  while cond==True:
    try:
      

          time.sleep(2)
          res = requests.request("GET", url, headers=headers)
          res.raise_for_status()
          link = httplink.parse_link_header(res.headers["link"])

          data = res.json()
          for i, item in enumerate(data["items"], start=total):
              print(f'[{i}] {item["html_url"]}')

          if "next" not in link:
              break

          total += len(data["items"])

          url = link["next"].target

    # Except case to catch when secondary rate limit has been reached and prevent the computation from stopping
    except requests.exceptions.HTTPError as err:
        print("err = ", err)
        print("err.response.text = ", err.response.text)
        # backoff **= 2
        print("backoff = ", backoff)
        time.sleep(backoff)
    # Except case to catch when the given file size provides no results
    except KeyError as error:
      print("err = ", error)

      # Set cond to False to stop the while loop
      cond = False
      continue

Based on this answer , it seems like it is a common occurrence.基于这个答案,这似乎是一个普遍现象。 However, I was hoping someone could suggest a workaround.但是,我希望有人可以提出解决方法。

I have added the tag Octokit, although I am not using that, to increase visibility and since this seems like a common problem.我添加了 Octokit 标签,虽然我没有使用它,以提高可见性,因为这似乎是一个常见问题。

A big chunk of the above logic/code was obtained through SO answers, I highly appreciate all support from the community.上述逻辑/代码的很大一部分是通过 SO 答案获得的,我非常感谢社区的所有支持。

Note that search has its primary and secondary rate limiting that is lower than others.请注意,搜索的主要和次要速率限制低于其他限制。 For JavaScript, we have a throttle plugin that implements all the recommended best practices.对于 JavaScript,我们有一个节流插件,它实现了所有推荐的最佳实践。 For search we limit requests to 1 per 2 seconds . 对于搜索,我们将请求限制为每 2 秒 1 个 Hope that helps!希望有帮助!

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM