简体   繁体   中英

How to download a GitHub release asset from a private repository in python?

I'm using pygithub and i'm getting the proper assets I want to download, but I can't figure out how to actually get them because they're in a private repository. I've found similar questions here and here but I'm looking for a Python (3.7) solution.

This is the code I'm using to get the asset info of the asset I want:

from github import Github
g = Github('username', 'password')
asset = g.get_repo('user/repo').get_latest_release().get_assets()[0]
url = asset.browser_download_url

Now, I can verify url is correct by visiting it in my browser (which is already logged in to GitHub) and the download of the correct file immediately starts. Since pygithub doesn't seem to have a download option for assets, I've been trying to use requests to accomplish the same goal:

import requests
user_agent = 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/77.0.3865.120 Safari/537.36'
sess = requests.Session()
sess.auth = 'username', 'password'
response = sess.get(url, headers={'user-agent': user_agent})

And at this point response is always <Response [404]> . Since I'm sure the URL works fine in my browser I'm guessing I'm missing something about authenticating with GitHub before trying to download the file.

Any help would be appreciated (even if it needs other packages to be installed)

I ended up solving it by getting the "authenticity token" from the GitHub login page first, and then posting it:

import requests
from pathlib import Path
from github import Github
from bs4 import BeautifulSoup as bs

auth = 'username', 'password'
asset = Github(*auth).get_repo('user/repo').get_latest_release().get_assets()[0]
user_agent = 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/77.0.3865.120 Safari/537.36'
headers = {'user-agent': user_agent}
login_url = 'https://github.com/session'
session = requests.Session()
response = session.get(login_url, headers=headers)
authenticity_token = bs(response.text, 'lxml').find('input', attrs={'name': 'authenticity_token'})['value']
session.post(
    login_url,
    headers=headers,
    data=dict(
        commit='Sign in',
        utf8='%E2%9C%93',
        login=auth[0],
        password=auth[1],
        authenticity_token=authenticity_token
    )
)
# Now I'm logged in properly, I can download the private repository assets
response = session.get(asset.browser_download_url, headers=headers)
save_to = Path.home() / 'Downloads' / asset.name
save_to.write_bytes(response.content)

I think Github updated the API requirements for the authentification of users. Hence Github('username', 'password') does not work anymore to get information about private repositories. Instead, use a Personal Access Token ( https://github.com/settings/tokens ) and generate one for your application. This way, you do also not have to save your login data in your script, which is much safer.

This simplifies the code a bit. I got an example for downloading data and safe it to a path and writing to a local variable. Yet I did not figure out how to download a non-text-based file (like pickle) to a local variable.

import requests
import os
from github import Github
from io import StringIO
import pandas as pd
import logging
from pathlib import Path   

rawtoken = "githubtoken"
repository = "owner/reponame"

token = os.getenv('GITHUB_TOKEN', rawtoken)
g = Github(token)
headers = {'Authorization': 'token ' + rawtoken,
          'Accept': 'application/octet-stream'}
session = requests.Session()

# asset_one: arbitrary file for 
asset_one = g.get_repo(repository).get_latest_release().get_assets()[0]
response = session.get(r_regress.url, stream = True, headers=headers)
dest = Path() / "downloads" / asset_one.name
with open(dest, 'wb') as f:
    for chunk in response.iter_content(1024*1024): 
        f.write(chunk)

# second asset: pandas dataframe -> safe in variable df
asset_two = g.get_repo(repository).get_latest_release().get_assets()[1]
df_response = session.get(asset_two.url, headers=headers)
data = StringIO(df_response.text)
df = pd.read_csv(data)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM