简体   繁体   English

如何将许多文件上传到 Google Colab?

[英]How to Upload Many Files to Google Colab?

I am working on a image segmentation machine learning project and I would like to test it out on Google Colab.我正在研究一个图像分割机器学习项目,我想在 Google Colab 上对其进行测试。

For the training dataset, I have 700 images, mostly 256x256 , that I need to upload into a python numpy array for my project.对于训练数据集,我有 700 张图像,主要是256x256 ,我需要将它们上传到我的项目的 python numpy 数组中。 I also have thousands of corresponding mask files to upload.我还有数千个相应的掩码文件要上传。 They currently exist in a variety of subfolders on Google drive, but I have been unable to upload them to Google Colab for use in my project.它们目前存在于 Google 驱动器上的各种子文件夹中,但我无法将它们上传到 Google Colab 以在我的项目中使用。

So far I have attempted using Google Fuse which seems to have very slow upload speeds and PyDrive which has given me a variety of authentication errors.到目前为止,我已经尝试使用 Google Fuse,它的上传速度似乎很慢,而 PyDrive 给了我各种身份验证错误。 I have been using the Google Colab I/O example code for the most part.我大部分时间都在使用 Google Colab I/O 示例代码。

How should I go about this?我应该怎么做? Would PyDrive be the way to go? PyDrive 会是要走的路吗? Is there code somewhere for uploading a folder structure or many files at a time?某处是否有用于一次上传文件夹结构或多个文件的代码?

You can put all your data into your google drive and then mount drive.您可以将所有数据放入谷歌驱动器,然后安装驱动器。 This is how I have done it.我就是这样做的。 Let me explain in steps.让我分步解释。

Step 1: Transfer your data into your google drive.第 1 步:将您的数据传输到您的谷歌驱动器中。

Step 2: Run the following code to mount you google drive.第 2 步:运行以下代码来安装您的谷歌驱动器。

# Install a Drive FUSE wrapper.
# https://github.com/astrada/google-drive-ocamlfuse
!apt-get install -y -qq software-properties-common python-software-properties module-init-tools
!add-apt-repository -y ppa:alessandro-strada/ppa 2>&1 > /dev/null
!apt-get update -qq 2>&1 > /dev/null
!apt-get -y install -qq google-drive-ocamlfuse fuse



# Generate auth tokens for Colab
from google.colab import auth
auth.authenticate_user()


# Generate creds for the Drive FUSE library.
from oauth2client.client import GoogleCredentials
creds = GoogleCredentials.get_application_default()
import getpass
!google-drive-ocamlfuse -headless -id={creds.client_id} -secret={creds.client_secret} < /dev/null 2>&1 | grep URL
vcode = getpass.getpass()
!echo {vcode} | google-drive-ocamlfuse -headless -id={creds.client_id} -secret={creds.client_secret}


# Create a directory and mount Google Drive using that directory.
!mkdir -p My Drive
!google-drive-ocamlfuse My Drive


!ls My Drive/

# Create a file in Drive.
!echo "This newly created file will appear in your Drive file list." > My Drive/created.txt

Step 3: Run the following line to check if you can see your desired data into mounted drive.步骤 3:运行以下行以检查是否可以在已安装的驱动器中看到所需的数据。

!ls Drive

Step 4:第 4 步:

Now load your data into numpy array as follows.现在将您的数据加载到 numpy 数组中,如下所示。 I had my exel files having my train and cv and test data.我有我的 exel 文件,里面有我的火车、简历和测试数据。

train_data = pd.read_excel(r'Drive/train.xlsx')
test = pd.read_excel(r'Drive/test.xlsx')
cv= pd.read_excel(r'Drive/cv.xlsx')

I hope it can help.我希望它能有所帮助。

Edit编辑

For downloading the data into your drive from the colab notebook environment, you can run the following code.要将数据从 colab notebook 环境下载到您的驱动器中,您可以运行以下代码。

# Install the PyDrive wrapper & import libraries.
# This only needs to be done once in a notebook.
!pip install -U -q PyDrive
from pydrive.auth import GoogleAuth
from pydrive.drive import GoogleDrive
from google.colab import auth
from oauth2client.client import GoogleCredentials



# Authenticate and create the PyDrive client.
# This only needs to be done once in a notebook.
auth.authenticate_user()
gauth = GoogleAuth()
gauth.credentials = GoogleCredentials.get_application_default()
drive = GoogleDrive(gauth)



# Create & upload a file.
uploaded = drive.CreateFile({'data.xlsx': 'data.xlsx'})
uploaded.SetContentFile('data.xlsx')
uploaded.Upload()
print('Uploaded file with ID {}'.format(uploaded.get('id')))

Here are few steps to upload large dataset to Google Colab以下是将大型数据集上传到 Google Colab 的几个步骤

1.Upload your dataset to free cloud storage like dropbox, openload, etc.(I used dropbox) 1.将您的数据集上传到免费的云存储,如 dropbox、openload 等(我使用了 dropbox)
2.Create a shareable link of your uploaded file and copy it. 2.为您上传的文件创建一个可共享的链接并复制它。
3.Open your notebook in Google Colab and run this command in one of the cell: 3. 在 Google Colab 中打开您的笔记本并在其中一个单元格中运行此命令:

    !wget your_shareable_file_link

That's it!就是这样!
You can compress your dataset in zip or rar file and later unizp it after downloading it in Google Colab by using this command:您可以使用以下命令将数据集压缩为 zip 或 rar 文件,然后在 Google Colab 中下载后将其解压缩:

    !unzip downloaded_filename -d destination_folder

Zip you file first then upload it to Google Drive.首先压缩您的文件,然后将其上传到 Google Drive。

See this simple command to unzip:查看这个简单的解压命令:

!unzip {file_location}

Example:示例:

!unzip drive/models.rar

Step1 : Mount the Drive, by running the following command: Step1 :通过运行以下命令挂载驱动器:

from google.colab import drive
drive.mount('/content/drive')

This will output a link.这将输出一个链接。 Click on the link, hit allow, copy the authorization code and paste it the box present in colab cell with the text "Enter your authorization code:" written on top of it.单击链接,点击允许,复制授权码并将其粘贴到 colab 单元格中的框,上面写有“输入您的授权码:”文本。 This process is just giving permission for colab to access your Google Drive.此过程只是授予 colab 访问您的 Google 云端硬盘的权限。

Step2 : Upload your folder(zipped or unzipped depending on the size of the folder) to Google Drive Step2 : 上传你的文件夹(压缩或解压取决于文件夹的大小)到 Google Drive

Step3 : Now work your way into the Drive directories and files to locate your uploaded folder/zipped file.步骤 3 :现在进入 Drive 目录和文件以找到您上传的文件夹/压缩文件。

This process may look something like this: The current working directory in colab when you start off will be /content/ Just to make sure, run the following command in the cell:此过程可能如下所示: 启动时 colab 中的当前工作目录将是 /content/ 只是为了确保,在单元格中运行以下命令:

!pwd

It will show you the current directory you are in. (pwd stands for "print working directory") Then use the commands like:它将显示您所在的当前目录。(pwd 代表“打印工作目录”)然后使用如下命令:

!ls

to list the directories and files in the directory you are in and the command:列出您所在目录中的目录和文件以及命令:

!cd /directory/name/of/your/choice

to move into the directories to locate your uploaded folder or the uploaded .zip file.进入目录以找到您上传的文件夹或上传的 .zip 文件。

And just like that, you are ready to get your hands dirty with your Machine Learning model!就像这样,您已经准备好使用您的机器学习模型了! :) :)

Hopefully, these simple steps will prevent you from spending too much unnecessary time on figuring out how colab works when you should actually be spending the majority of your time figuring out the Machine learning model, its hyperparameters, pre-processing...希望这些简单的步骤可以防止您花费太多不必要的时间来弄清楚 colab 的工作原理,而实际上您应该花费大部分时间来弄清楚机器学习模型、它的超参数、预处理……

你可能想尝试kaggle-cli模块,为讨论在这里

There are many ways to do so :有很多方法可以做到:

  1. You might want to push your data into a github repository then in Google Colab code cell you can run :您可能希望push数据push送到 github 存储库,然后在 Google Colab 代码单元中运行:

    !git clone https://www.github.com/ {repo}.git !git clone https://www.github.com/ {repo}.git

  2. You can upload your data to Google drive then in your code cell :您可以将数据上传到Google drive然后在您的代码单元格中:

from google.colab import drive

drive.mount('/content/drive')

  1. Use transfer.sh tool : you can visit here to see how it works :使用 transfer.sh 工具:您可以访问此处查看其工作原理:

    transfer.sh传输文件

Google Colab had made it more convenient for users to upload files [from the local machine, Google drive, or github]. Google Colab 使用户可以更方便地上传文件 [从本地机器、Google 驱动器或 github]。 You need to click on Mount Drive Option to the pane on the left side of the notebook and you'll get access to all the files stored in your drive.您需要单击笔记本左侧窗格中的安装驱动器选项,您将可以访问存储在驱动器中的所有文件。

Select the file -> right-click -> Copy path Refer this选择文件 -> 右击 -> 复制路径参考这个

Use python import methods to import files from this path, eg, for example:使用 python 导入方法从此路径导入文件,例如:

import pandas as pd
data = pd.read_csv('your copied path here')

For importing multiple files in one go, you may need to write a function.要一次性导入多个文件,您可能需要编写一个函数。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM