使用 HTTParty 和 Ruby on Rails 使用分页的 Github API 数据

Question

I'm building a web scraper for Github's repository data and storing specific repo attributes in a local database.我建立了GitHub的仓库中数据的Web刮刀，并在本地数据库中存储的特定回购属性。 I'm currently running into an issue pulling data beyond their one page (100 records) limit.我目前遇到了将数据拉出超出一页（100 条记录）限制的问题。

Here's my api call and method to extract the appropriate data:这是我的 api 调用和提取适当数据的方法：

require 'active_interaction'
require 'json'
class GitHubGet < ActiveInteraction::Base
  def execute
    response = HTTParty.get(process_path)
    # extract_github_data(response)
  end

  def extract_github_data(response)
    parsed_response = JSON.parse(response.body)
    result = []
    parsed_response["items"].each do |item|
      if item["updated_at"] > 1.day.ago
        result << {
          name: item["name"],
          owner: item["owner"]["login"],
          url: item["url"],
          stars: item["stargazers_count"]
        }
      end
    end
    puts result 
  end 

  private

  def process_path
    "https://api.github.com/search/repositories?q=license:mit+license:apache-2.0+license:gpl+license:lgpl+stars:1..2000+fork:false&per_page=100"
  end

end

Any help on how to pull in more than one page of data would be greatly appreciated!任何有关如何提取一页以上数据的帮助将不胜感激！ Thanks!谢谢！

Answer 1

响应对象中的标头有一个键链接，带有到下一页的 url。

使用 HTTParty 和 Ruby on Rails 使用分页的 Github API 数据

问题描述

1 个解决方案

解决方案1
0 2019-02-27 22:56:56

使用 HTTParty 和 Ruby on Rails 使用分页的 Github API 数据

问题描述

1 个解决方案

解决方案1 0 2019-02-27 22:56:56

解决方案1
0 2019-02-27 22:56:56