[英]Consuming paginated Github API data using HTTParty with Ruby on Rails
I'm building a web scraper for Github's repository data and storing specific repo attributes in a local database.我建立了GitHub的仓库中数据的Web刮刀,并在本地数据库中存储的特定回购属性。 I'm currently running into an issue pulling data beyond their one page (100 records) limit.
我目前遇到了将数据拉出超出一页(100 条记录)限制的问题。
Here's my api call and method to extract the appropriate data:这是我的 api 调用和提取适当数据的方法:
require 'active_interaction'
require 'json'
class GitHubGet < ActiveInteraction::Base
def execute
response = HTTParty.get(process_path)
# extract_github_data(response)
end
def extract_github_data(response)
parsed_response = JSON.parse(response.body)
result = []
parsed_response["items"].each do |item|
if item["updated_at"] > 1.day.ago
result << {
name: item["name"],
owner: item["owner"]["login"],
url: item["url"],
stars: item["stargazers_count"]
}
end
end
puts result
end
private
def process_path
"https://api.github.com/search/repositories?q=license:mit+license:apache-2.0+license:gpl+license:lgpl+stars:1..2000+fork:false&per_page=100"
end
end
Any help on how to pull in more than one page of data would be greatly appreciated!任何有关如何提取一页以上数据的帮助将不胜感激! Thanks!
谢谢!
响应对象中的标头有一个键链接,带有到下一页的 url。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.