简体   繁体   English

如何将 .npy 文件作为 numpy 数组加载到 Google Colab 上的虚拟机中

[英]How to load a .npy file as a numpy array into the Virtual machine on Google Colab

I have some datasets and labels that are basically numpy saved files with the extension .npy.我有一些数据集和标签,它们基本上是扩展名为 .npy 的 numpy 保存文件。

I have saved train.npy and train_labels.npy in my google drive.我已经在我的谷歌驱动器中保存了 train.npy 和 train_labels.npy。

While using Google Colab, I have to use that data.在使用 Google Colab 时,我必须使用这些数据。 I am able to find folder and the id of the data files in my drive.我能够在我的驱动器中找到文件夹和数据文件的 ID。 How do I load those data files into the memory of my virtual machine that Google Colab uses?如何将这些数据文件加载到 Google Colab 使用的虚拟机内存中?

Solved it.解决了。

First do the simple authentication as stated in the doc首先按照文档中的说明进行简单的身份验证

from pydrive.auth import GoogleAuth
from pydrive.drive import GoogleDrive
from google.colab import auth
from oauth2client.client import GoogleCredentials

# 1. Authenticate and create the PyDrive client.
auth.authenticate_user()
gauth = GoogleAuth()
gauth.credentials = GoogleCredentials.get_application_default()
drive = GoogleDrive(gauth)

I created some helper function which gets the file id if you know the file name and folder id where the file is present on drive.如果您知道文件在驱动器上的文件名和文件夹 ID,我创建了一些帮助函数来获取文件 ID。 Folder id is the last part of the link in drive.google.com/../../folders/文件夹 ID 是 drive.google.com/../../folders/ 中链接的最后一部分

def get_file_from_drive(folder_id, file_name):
  file_list = drive.ListFile({'q': "'" + folder_id + "' in parents and 
trashed=false"}).GetList()
  for file in file_list:
    if file['title'] == file_name:
      return file['id']

def upload_file_to_drive(file_name, file_data):
  uploaded = drive.CreateFile({'title': file_name})
  uploaded.SetContentString(file_data)
  uploaded.Upload()
  print('Uploaded file with ID {}'.format(uploaded.get('id')))

drive_folder_id = '<Folder ID>'

This function uploads the file from google drive into the virtual system that colab allows you to use.此功能将文件从 google drive 上传到 colab 允许您使用的虚拟系统中。

def upload_data_system():
  downloaded = drive.CreateFile({'id': get_file_from_drive(drive_folder_id, 'train.npy')})
  downloaded.GetContentFile('train.npy') 

  downloaded = drive.CreateFile({'id': get_file_from_drive(drive_folder_id, 'train_labels.npy')})
  downloaded.GetContentFile('train_labels.

upload_data_system()

Viola!中提琴! Your files are uploaded to the file system and can be loaded into memory using simple python as it was done locally.您的文件被上传到文件系统,并且可以像在本地完成一样使用简单的 python 加载到内存中。 To verify, run this on colab.要验证,请在 colab 上运行它。 You should see your files你应该看到你的文件

import os
from os import listdir

for f in os.listdir('.'):
  if os.path.isfile(f):
    print(f)

Now load your numpy file as np.load(path_to_file_in_filesystem)现在将您的 numpy 文件加载为 np.load(path_to_file_in_filesystem)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM