简体   繁体   English

从 Google Colab 中的 GitHub 加载.zip 文件

[英]Load .zip file from GitHub in Google Colab

I have a zip file at my GitHub repo.我的 GitHub 存储库中有一个 zip 文件。 I want to load it into my Google Colab files.我想将它加载到我的 Google Colab 文件中。 I have it's url from where it can be dowloaded like https://raw.githubusercontent.com/rehmatsg/../master/...zip我有它 url 可以从那里下载它,如https://raw.githubusercontent.com/rehmatsg/../master/...zip

I used this method to download file into Google Colab我使用这种方法将文件下载到 Google Colab

from google.colab import files

url = 'https://raw.githubusercontent.com/user/.../master/...zip'
files.download(url)

But I get this error但我得到这个错误

FileNotFoundError                         Traceback (most recent call last)
<ipython-input-5-c974a89c0412> in <module>()
      3 from google.colab import files
      4 
----> 5 files.download(url)

/usr/local/lib/python3.7/dist-packages/google/colab/files.py in download(filename)
    141       raise OSError(msg)
    142     else:
--> 143       raise FileNotFoundError(msg)  # pylint: disable=undefined-variable
    144 
    145   comm_manager = _IPython.get_ipython().kernel.comm_manager

FileNotFoundError: Cannot find file: https://raw.githubusercontent.com/user/.../master/...zip

Files in Google Colab are temporary, so I cannot upload it each time. Google Colab 中的文件是临时的,所以我不能每次都上传。 This is the reason I wanted to host the file in my project's GitHub repo.这就是我想在我的项目的 GitHub 存储库中托管文件的原因。 What would be the correct method to download the file into Google Colab?将文件下载到 Google Colab 的正确方法是什么?

Use the "wget" bash command.使用“wget”bash 命令。 Just open the Github project and go to the download as zip option (at top right).只需打开 Github 项目和 go 即可下载为 zip 选项(在右上角)。 Then, copy the url and use "wget" command.然后,复制 url 并使用“wget”命令。

:wget https.//github.com/nytimes/covid-19-data/archive/refs/heads/master.zip

good luck.祝你好运。

Let's suppose that ie GitHub repo https://github.com/lukyfox/Datafiles contains folder digits with two zip files digits.zip and digits_small.zip . Let's suppose that ie GitHub repo https://github.com/lukyfox/Datafiles contains folder digits with two zip files digits.zip and digits_small.zip . To download and unzip certain zip file from GitHub repo (not the whole repo or folder, but only digits.zip) into Google Colab session storage:要从 GitHub 存储库(不是整个存储库或文件夹,而只是 digits.zip)下载某些 zip 文件并将其解压缩到 Google Colab session 存储中:

  1. go to zip file you want to download (ie https://github.com/lukyfox/Datafiles/blob/master/digits/digits.zip ) go to zip file you want to download (ie https://github.com/lukyfox/Datafiles/blob/master/digits/digits.zip )
  2. Locate button Download and copy its address (RMB->Copy link address), for the example above copied address is https://github.com/lukyfox/Datafiles/raw/master/digits/digits.zip定位按钮下载并复制其地址(RMB->复制链接地址),如上例复制地址为https://github.com/lukyfox/Datafiles/raw/master/digits/digits.zip
  3. Go to Google colab file and use !wget command with the copied address to download and !unzip to unzip the file into session storage: Go 到 Google colab 文件并使用复制地址的!wget命令下载和!unzip将文件解压缩到 session 存储:

:wget https.//github.com/lukyfox/Datafiles/raw/master/digits/digits.zip
.unzip /content/digits.zip

You can also rename the file after download or specify folder name for unzipped data.您还可以在下载后重命名文件或为解压缩的数据指定文件夹名称。

You may notice that the dowloadable address differs from zip file address just a little.您可能会注意到可下载地址与 zip 文件地址略有不同。 In fact should be enough to replace blob with raw to get the right address for any zip file.实际上应该足以用raw替换blob以获得任何 zip 文件的正确地址。

You could do this to clone the entire repository.您可以这样做来克隆整个存储库。

!git clone https://personalaccesstoken@github.com/username/reponame.git

This creates a folder called reponame and is convenient if you have many files to download.这会创建一个名为reponame的文件夹,如果您有很多文件要下载,这会很方便。 The personalaccesstoken allows private repositories to be accessed. personalaccesstoken访问令牌允许访问私有存储库。

I do not use Google Colab, but I looked at this description .我不使用 Google Colab,但我查看了这个描述 Understand that the google.colab.download option is to download Google Colab files.了解google.colab.download选项是下载 Google Colab 文件。 It's not for downloading any file.它不是用于下载任何文件。 If this file is public you can use other libraries to retrieve the file.如果此文件是公开的,您可以使用其他库来检索该文件。 For example, you can use urllib :例如,您可以使用urllib

from urllib.request import urlretrieve
urlretrieve(url)

If you decide you need more files and use the code, then consider the other answer about git clone如果您决定需要更多文件并使用代码,请考虑有关git clone的其他答案

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM