简体   繁体   English

随机抽样 Github 存储库

[英]Randomly sample Github repositories

I'm looking for a solution to randomly sample repos from Github.我正在寻找一种从 Github 随机抽样 repos 的解决方案。 The final result is to perform some data analysis on the sample.最终的结果是对样本进行一些数据分析。

What I would like to do is sample by the repository's id: sample an int between 0 and 2.7 million and find the associated repo.我想做的是通过存储库的 id 进行采样:采样一个介于 0 到 270 万之间的 int 并找到相关的存储库。 After I have the username/repo-name, I'll use the api to get details.获得用户名/存储库名称后,我将使用 api 获取详细信息。

The problem is I do not know how to search by repo id.问题是我不知道如何通过 repo id 进行搜索。 Any suggestions?有什么建议么? I'm open to webscraping or Python solutions.我对网页抓取或 Python 解决方案持开放态度。

You can use python to access GitHUb V3 Api (as in " Most suitable python library for Github API v3 ").您可以使用 python 访问 GitHUb V3 Api(如“ 最适合 Github API v3 的 Python 库”)。

And you can access GitHub repos , from a certain id ( GET /repositories , with as parameter, integer ID of the last Repository that you've seen: so that can provide a roundabout way to access repos with their id.并且您可以访问 GitHub GET /repositories ,从某个 ID( GET /repositories ,作为参数,您所看到的最后一个存储库的整数 ID:这样可以提供一种迂回的方式来访问带有其 ID 的存储库。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM