简体   繁体   English

Google Colab:如何从我的谷歌驱动器中读取数据?

[英]Google Colab: how to read data from my google drive?

The problem is simple: I have some data on gDrive, for example at /projects/my_project/my_data* .问题很简单:我在 gDrive 上有一些数据,例如/projects/my_project/my_data*

Also I have a simple notebook in gColab.我在 gColab 中也有一个简单的笔记本。

So, I would like to do something like:所以,我想做类似的事情:

for file in glob.glob("/projects/my_project/my_data*"):
    do_something(file)

Unfortunately, all examples (like this - https://colab.research.google.com/notebook#fileId=/v2/external/notebooks/io.ipynb , for example) suggests to only mainly load all necessary data to notebook.不幸的是,所有示例(例如 - https://colab.research.google.com/notebook#fileId=/v2/external/notebooks/io.ipynb )建议仅主要将所有必要数据加载到笔记本中。

But, if I have a lot of pieces of data, it can be quite complicated.但是,如果我有很多数据,它可能会非常复杂。 Is there any opportunities to solve this issue?有没有机会解决这个问题?

Thanks for help!感谢帮助!

Edit : As of February, 2020, there's now a first-class UI for automatically mounting Drive.编辑:截至 2020 年 2 月,现在有一个用于自动安装 Drive 的一流 UI。

First, open the file browser on the left hand side.首先,打开左侧的文件浏览器。 It will show a 'Mount Drive' button.它将显示一个“安装驱动器”按钮。 Once clicked, you'll see a permissions prompt to mount Drive, and afterwards your Drive files will be present with no setup when you return to the notebook.单击后,您将看到安装 Drive 的权限提示,之后您的 Drive 文件将在您返回笔记本时无需设置即可显示。 The completed flow looks like so:完成的流程如下所示:

驱动器自动挂载示例

The original answer follows, below.原始答案如下。 (This will also still work for shared notebooks.) (这也适用于共享笔记本。)

You can mount your Google Drive files by running the following code snippet:您可以通过运行以下代码片段来挂载 Google Drive 文件:

from google.colab import drive
drive.mount('/content/drive')

Then, you can interact with your Drive files in the file browser side panel or using command-line utilities.然后,您可以在文件浏览器侧面板中或使用命令行实用程序与您的云端硬盘文件进行交互。

Here's an example notebook 这是一个示例笔记本

Good news, PyDrive has first class support on CoLab!好消息, PyDrive对 CoLab 有一流的支持! PyDrive is a wrapper for the Google Drive python client. PyDrive 是 Google Drive python 客户端的包装器。 Here is an example on how you would download ALL files from a folder, similar to using glob + * :这是一个关于如何从文件夹下载所有文件的示例,类似于使用glob + *

!pip install -U -q PyDrive
import os
from pydrive.auth import GoogleAuth
from pydrive.drive import GoogleDrive
from google.colab import auth
from oauth2client.client import GoogleCredentials

# 1. Authenticate and create the PyDrive client.
auth.authenticate_user()
gauth = GoogleAuth()
gauth.credentials = GoogleCredentials.get_application_default()
drive = GoogleDrive(gauth)

# choose a local (colab) directory to store the data.
local_download_path = os.path.expanduser('~/data')
try:
  os.makedirs(local_download_path)
except: pass

# 2. Auto-iterate using the query syntax
#    https://developers.google.com/drive/v2/web/search-parameters
file_list = drive.ListFile(
    {'q': "'1SooKSw8M4ACbznKjnNrYvJ5wxuqJ-YCk' in parents"}).GetList()

for f in file_list:
  # 3. Create & download by id.
  print('title: %s, id: %s' % (f['title'], f['id']))
  fname = os.path.join(local_download_path, f['title'])
  print('downloading to {}'.format(fname))
  f_ = drive.CreateFile({'id': f['id']})
  f_.GetContentFile(fname)


with open(fname, 'r') as f:
  print(f.read())

Notice that the arguments to drive.ListFile is a dictionary that coincides with the parameters used by Google Drive HTTP API (you can customize the q parameter to be tuned to your use-case).请注意, drive.ListFile的参数是一个字典,与Google Drive HTTP API使用的参数一致(您可以自定义q参数以适应您的用例)。

Know that in all cases, files/folders are encoded by id's (peep the 1SooKSw8M4ACbznKjnNrYvJ5wxuqJ-YCk ) on Google Drive.要知道,在所有情况下,文件/文件夹都是由 Google Drive 上的 id(查看 1SooKSw8M4ACbznKjnNrYvJ5wxuqJ-YCk )编码的。 This requires that you search Google Drive for the specific id corresponding to the folder you want to root your search in.这要求您在 Google 云端硬盘中搜索与您要在其中进行搜索的文件夹对应的特定 ID。

For example, navigate to the folder "/projects/my_project/my_data" that is located in your Google Drive.例如,导航到位于 Google 云端硬盘中的文件夹"/projects/my_project/my_data"

谷歌云端硬盘

See that it contains some files, in which we want to download to CoLab.看到它包含一些我们要下载到 CoLab 的文件。 To get the id of the folder in order to use it by PyDrive, look at the url and extract the id parameter.要获取文件夹的 id 以便 PyDrive 使用它,请查看 url 并提取 id 参数。 In this case, the url corresponding to the folder was:在这种情况下,文件夹对应的 url 是:

https://drive.google.com/drive/folders/1SooKSw8M4ACbznKjnNrYvJ5wxuqJ-YCk

Where the id is the last piece of the url: 1SooKSw8M4ACbznKjnNrYvJ5wxuqJ-YCk .其中 id 是 url 的最后一部分: 1SooKSw8M4ACbznKjnNrYvJ5wxuqJ-YCk

Thanks for the great answers!谢谢你的好答案! Fastest way to get a few one-off files to Colab from Google drive: Load the Drive helper and mount从 Google Drive 将一些一次性文件获取到 Colab 的最快方法:加载 Drive 助手并挂载

from google.colab import drive

This will prompt for authorization.这将提示授权。

drive.mount('/content/drive')

Open the link in a new tab-> you will get a code - copy that back into the prompt you now have access to google drive check:在新选项卡中打开链接 -> 您将获得一个代码 - 将其复制回提示中,您现在可以访问谷歌驱动器检查:

!ls "/content/drive/My Drive"

then copy file(s) as needed:然后根据需要复制文件:

!cp "/content/drive/My Drive/xy.py" "xy.py"

confirm that files were copied:确认文件已复制:

!ls

What I have done is first:我所做的首先是:

from google.colab import drive
drive.mount('/content/drive/')

Then然后

%cd /content/drive/My Drive/Colab Notebooks/

After I can for example read csv files with例如,在我可以读取 csv 文件之后

df = pd.read_csv("data_example.csv")

If you have different locations for the files just add the correct path after My Drive如果您有不同的文件位置,只需在“我的驱动器”后添加正确的路径

Most of the previous answers are a bit(Very) complicated,以前的大多数答案都有点(非常)复杂,

from google.colab import drive
drive.mount("/content/drive", force_remount=True)

I figured out this to be the easiest and fastest way to mount google drive into CO Lab , You can change the mount directory location to what ever you want by just changing the parameter for drive.mount .我发现这是将 google drive 挂载到CO Lab的最简单和最快的方法,您只需更改drive.mount的参数即可将mount directory location更改为您想要的任何mount directory location It will give you a link to accept the permissions with your account and then you have to copy paste the key generated and then drive will be mounted in the selected path.它会给你一个链接来接受你的帐户的权限,然后你必须复制粘贴生成的密钥,然后驱动器将安装在选定的路径中。

force_remount is used only when you have to mount the drive irrespective of whether its loaded previously.You can neglect this when parameter if you don't want to force mount force_remount仅在您必须挂载驱动器时使用,而不管其之前是否已加载。如果您不想强制挂载,则可以忽略此 when 参数

Edit: Check this out to find more ways of doing the IO operations in colab https://colab.research.google.com/notebooks/io.ipynb编辑:查看此内容以找到更多在 colab 中执行IO操作的方法https://colab.research.google.com/notebooks/io.ipynb

You can't permanently store a file on colab.您不能在 colab 上永久存储文件。 Though you can import files from your drive and everytime when you are done with file you can save it back.虽然您可以从驱动器导入文件,但每次完成文件后,您都可以将其保存回来。

To mount the google drive to your Colab session将谷歌驱动器安装到您的 Colab 会话

from google.colab import drive
drive.mount('/content/gdrive')

you can simply write to google drive as you would to a local file system Now if you see your google drive will be loaded in the Files tab.您可以像写入本地文件系统一样简单地写入谷歌驱动器现在,如果您看到您的谷歌驱动器将被加载到“文件”选项卡中。 Now you can access any file from your colab, you can write as well as read from it.现在您可以访问 colab 中的任何文件,您可以对其进行写入和读取。 The changes will be done real time on your drive and anyone having the access link to your file can view the changes made by you from your colab.更改将在您的驱动器上实时完成,任何拥有文件访问链接的人都可以从您的 colab 查看您所做的更改。

Example例子

with open('/content/gdrive/My Drive/filename.txt', 'w') as f:
   f.write('values')

I'm lazy and my memory is bad, so I decided to create easycolab which is easier to memorize and type:我很懒,记性不好,所以我决定创建更容易记忆和输入的easycolab

import easycolab as ec
ec.mount()

Make sure to install it first: !pip install easycolab确保先安装它: !pip install easycolab

The mount() method basically implement this: mount()方法基本上实现了这个:

from google.colab import drive
drive.mount(‘/content/drive’)
cd ‘/content/gdrive/My Drive/’

To read all files in a folder:要读取文件夹中的所有文件:

import glob
from google.colab import drive
drive.mount('/gdrive', force_remount=True)

#!ls "/gdrive/My Drive/folder"

files = glob.glob(f"/gdrive/My Drive/folder/*.txt")
for file in files:  
  do_something(file)
from google.colab import drive
drive.mount('/content/drive')

This worked perfect for me I was later able to use the os library to access my files just like how I access them on my PC这对我来说非常有用,我后来可以使用os库来访问我的文件,就像我在 PC 上访问它们一样

You can simply make use of the code snippets on the left of the screen.您可以简单地使用屏幕左侧的代码片段。 enter image description here在此处输入图片说明

Insert "Mounting Google Drive in your VM"插入“在 VM 中安装 Google Drive”

run the code and copy&paste the code in the URL运行代码并将代码复制粘贴到 URL 中

and then use !ls to check the directories然后使用 !ls 检查目录

!ls /gdrive

for most cases, you will find what you want in the directory "/gdrive/My drive"大多数情况下,你会在目录“/gdrive/My drive”中找到你想要的

then you may carry it out like this:那么你可以这样执行:

from google.colab import drive
drive.mount('/gdrive')
import glob

file_path = glob.glob("/gdrive/My Drive/***.txt")
for file in file_path:
    do_something(file)

I wrote a class that downloads all of the data to the '.'我写了一个类,将所有数据下载到 '.' location in the colab server在 colab 服务器中的位置

The whole thing can be pulled from here https://github.com/brianmanderson/Copy-Shared-Google-to-Colab整个事情可以从这里拉出https://github.com/brianmanderson/Copy-Shared-Google-to-Colab

!pip install PyDrive


from pydrive.auth import GoogleAuth
from pydrive.drive import GoogleDrive
from google.colab import auth
from oauth2client.client import GoogleCredentials
import os

class download_data_from_folder(object):
    def __init__(self,path):
        path_id = path[path.find('id=')+3:]
        self.file_list = self.get_files_in_location(path_id)
        self.unwrap_data(self.file_list)
    def get_files_in_location(self,folder_id):
        file_list = drive.ListFile({'q': "'{}' in parents and trashed=false".format(folder_id)}).GetList()
        return file_list
    def unwrap_data(self,file_list,directory='.'):
        for i, file in enumerate(file_list):
            print(str((i + 1) / len(file_list) * 100) + '% done copying')
            if file['mimeType'].find('folder') != -1:
                if not os.path.exists(os.path.join(directory, file['title'])):
                    os.makedirs(os.path.join(directory, file['title']))
                print('Copying folder ' + os.path.join(directory, file['title']))
                self.unwrap_data(self.get_files_in_location(file['id']), os.path.join(directory, file['title']))
            else:
                if not os.path.exists(os.path.join(directory, file['title'])):
                    downloaded = drive.CreateFile({'id': file['id']})
                    downloaded.GetContentFile(os.path.join(directory, file['title']))
        return None
data_path = 'shared_path_location'
download_data_from_folder(data_path)

To extract Google Drive zip from a Google colab notebook for example:例如,要从 Google colab notebook 中提取 Google Drive zip:

import zipfile
from google.colab import drive

drive.mount('/content/drive/')

zip_ref = zipfile.ZipFile("/content/drive/My Drive/ML/DataSet.zip", 'r')
zip_ref.extractall("/tmp")
zip_ref.close()

@wenkesj @wenkesj

I am speaking about copy the directory and all it subdirectories.我说的是复制目录及其所有子目录。

For me, I found a solution, that looks like this:对我来说,我找到了一个解决方案,看起来像这样:

def copy_directory(source_id, local_target):
  try:
    os.makedirs(local_target)
  except: 
    pass
  file_list = drive.ListFile(
    {'q': "'{source_id}' in parents".format(source_id=source_id)}).GetList()
  for f in file_list:
    key in ['title', 'id', 'mimeType']]))
    if f["title"].startswith("."):
      continue
    fname = os.path.join(local_target, f['title'])
    if f['mimeType'] == 'application/vnd.google-apps.folder':
      copy_directory(f['id'], fname)
    else:
      f_ = drive.CreateFile({'id': f['id']})
      f_.GetContentFile(fname)

Nevertheless, I looks like gDrive don't like to copy too much files.不过,我看起来 gDrive 不喜欢复制太多文件。

There are many ways to read the files in your colab notebook(**.ipnb), a few are:有很多方法可以读取 colab notebook(**.ipnb) 中的文件,其中一些是:

  1. Mounting your Google Drive in the runtime's virtual machine.在运行时的虚拟机中安装您的 Google Drive。 here &, here这里&, 这里
  2. Using google.colab.files.upload().使用 google.colab.files.upload()。 the easiest solution 最简单的解决方案
  3. Using the native REST API ;使用原生 REST API
  4. Using a wrapper around the API such asPyDrive使用 API 的包装器,例如PyDrive

Method 1 and 2 worked for me , rest I wasn't able to figure out.方法 1 和 2对我有用,其余的我无法弄清楚。 If anyone could, as others tried in above post please write an elegant answer.如果有人可以,正如其他人在上面的帖子中尝试过的那样,请写一个优雅的答案。 thanks in advance.!提前致谢。!

First method:第一种方法:

I wasn't able to mount my google drive, so I installed these libraries我无法安装我的谷歌驱动器,所以我安装了这些库

# Install a Drive FUSE wrapper.
# https://github.com/astrada/google-drive-ocamlfuse

!apt-get install -y -qq software-properties-common python-software-properties module-init-tools
!add-apt-repository -y ppa:alessandro-strada/ppa 2>&1 > /dev/null
!apt-get update -qq 2>&1 > /dev/null
!apt-get -y install -qq google-drive-ocamlfuse fuse

from google.colab import auth
auth.authenticate_user()
from oauth2client.client import GoogleCredentials
creds = GoogleCredentials.get_application_default()
import getpass

!google-drive-ocamlfuse -headless -id={creds.client_id} -secret={creds.client_secret} < /dev/null 2>&1 | grep URL
vcode = getpass.getpass()
!echo {vcode} | google-drive-ocamlfuse -headless -id={creds.client_id} -secret={creds.client_secret}

Once the installation & authorization process is finished, you first mount your drive.安装和授权过程完成后,首先安装驱动器。

!mkdir -p drive
!google-drive-ocamlfuse drive

After installation I was able to mount the google drive, everything in your google drive starts from /content/drive安装后,我能够挂载谷歌驱动器,谷歌驱动器中的所有内容都从/content/drive 开始

!ls /content/drive/ML/../../../../path_to_your_folder/

Now you can simply read the file from path_to_your_folder folder into pandas using the above path.现在您可以简单地使用上述路径将path_to_your_folder文件夹中的文件读取到 Pandas 中。

import pandas as pd
df = pd.read_json('drive/ML/../../../../path_to_your_folder/file.json')
df.head(5)

you are suppose you use absolute path you received & not using /../..您假设您使用收到的绝对路径而不使用 /../..

Second method :第二种方法

Which is convenient, if your file which you want to read it is present in the current working directory.如果您要读取的文件存在于当前工作目录中,这很方便。

If you need to upload any files from your local file system, you could use below code, else just avoid it.!如果你需要从本地文件系统上传任何文件,你可以使用下面的代码,否则就避免它。!

from google.colab import files
uploaded = files.upload()
for fn in uploaded.keys():
  print('User uploaded file "{name}" with length {length} bytes'.format(
      name=fn, length=len(uploaded[fn])))

suppose you have below the folder hierarchy in your google drive:假设您的谷歌驱动器中的文件夹层次结构低于:

/content/drive/ML/../../../../path_to_your_folder/

Then, you simply need below code to load into pandas.然后,您只需要以下代码即可加载到 Pandas 中。

import pandas as pd
import io
df = pd.read_json(io.StringIO(uploaded['file.json'].decode('utf-8')))
df

考虑只用下载永久链路和文件gdown预装喜欢这里

Read images from google drive using colab notebook使用 colab notebook 从 google drive 读取图像

import glob
images_list = glob.glob("add google drive path/*.jpg")
print(images_list)

Create training.txt file, required for YOLOv4 training创建 training.txt 文件,YOLOv4 训练需要

file = open("/content/drive/MyDrive/project data/obj/train.txt", "w") 
file.write("\n".join(images_list)) 
file.close() 

27/12/2022 Vy update: 27/12/2022 更新:

from google.colab import drive
drive.mount('/content/gdrive/')

在此处输入图像描述

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 在 Google CoLab Notebook 中,如何在不经过两次身份验证的情况下从公共 Google 云端硬盘和我的个人云端硬盘中读取数据? - In Google CoLab Notebook, how to read data from a Public Google Drive AND my personal drive *without* authenticating twice? Google CoLab:如何使用opencv从我的Google驱动器读取数据? - Google CoLab: how to use opencv to read data from my Google drive? Google Colab:如何在 Google Colab 中从 Google Drive 中读取多个图像 - Google Colab: How To Read Multiple Images From a Google Drive in Google Colab 如何正确地将数据从Google云端硬盘导入Google Colab Notebook? - How to properly import data from Google Drive to Google Colab Notebook? 如何将数据从谷歌驱动器导入谷歌colab? - How to import data into google colab from google drive? 从本地驱动器读取文件到google colab - read in a file from a local drive to google colab 从 Google Drive Colab 下载数据 - Downloading Data from Google Drive Colab 无法从放置在 Google Drive 和使用 google colab 上的文件中解压缩和读取数据 - Not able to unzip and read data from File placed on Google Drive and using google colab 在 Colab 中访问公共 Google 驱动器文件夹(不是来自我的驱动器)? - Access public Google drive folder (not from my drive) in Colab? Google Colab:如何从谷歌驱动器中读取所有可用的文件名 - Google Colab: How to read all available file names from google drive
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM