简体   繁体   English

在 Google CoLab Notebook 中,如何在不经过两次身份验证的情况下从公共 Google 云端硬盘和我的个人云端硬盘中读取数据?

[英]In Google CoLab Notebook, how to read data from a Public Google Drive AND my personal drive *without* authenticating twice?

I have a Google CoLab notebook used by third-parties.我有一个第三方使用的 Google CoLab 笔记本。 The user of the notebook needs the notebook to read CSVs both from their personal mounted GDrive as well as from a 3rd-party publicly shared GDrive.笔记本的用户需要笔记本从他们个人安装的 GDrive以及第 3 方公开共享的 GDrive 中读取 CSV。 As far as I can tell, reading from these 2 different sources each require the user to complete an authentication verification code workflow copy/pasting a code each time.据我所知,从这两个不同的来源读取每个都需要用户完成身份验证验证码工作流程,每次都复制/粘贴代码。 The UX would be much improved if they only had to do a single authentication verification, rather than 2.如果他们只需要进行一次身份验证验证,而不是 2 次,则 UX 将会大大改善。

Put another way: if I've already authenticated and verified who I am to mount my drive, then why do I need to do it again to read data from a publicly shared Google Drive?换句话说:如果我已经验证并验证了我是谁来安装我的驱动器,那么为什么我需要再次这样做才能从公开共享的 Google Drive 中读取数据?

I figured there would be someway to use the authentication from one method first step in the second method (see details below), or to somehow request permissions to both in a single step, but I am not having any luck figuring it out.我想有办法在第二种方法的第一步中使用一种方法的身份验证(请参阅下面的详细信息),或者以某种方式在一个步骤中请求两者的权限,但我没有任何运气弄清楚它。

Background背景

There has been a lot written about how to read data into Google Colab notebooks: Import data into Google Colaboratory & Towards Data Science - 3 ways to load CSV files into colab and Google CoLab's official helper notebook are some good references.关于如何将数据读入 Google Colab notebooks 的文章很多: Import data into Google Colaboratory & Towards Data Science - 3 ways to load CSV files into colabGoogle CoLab 的官方 helper notebook是一些很好的参考。

To quickly recap, you have a few options, depending on where the data is coming from.快速回顾一下,您有几个选择,具体取决于数据的来源。 If you are working with your own data, then an easy solution is to put your data in Google Drive, and then mount your drive.如果您使用自己的数据,那么一个简单的解决方案是将您的数据放入 Google Drive,然后安装您的驱动器。

from google.colab import drive as mountGoogleDrive
mountGoogleDrive.mount('/content/mountedDrive')

And you can read files as if they were in your local filesystem at content/mountedDrive/ .您可以像在本地文件系统中一样读取文件content/mountedDrive/

Sometimes mounting your drive is not sufficient.有时安装驱动器是不够的。 For example, let's say you want to read data from a publicly shared Google Drive owned by a 3rd party.例如,假设您要从第三方拥有的公开共享的 Google 云端硬盘中读取数据。 In this case, you can't mount your drive, because the shared data is not in your Drive.在这种情况下,您无法挂载您的驱动器,因为共享数据不在您的驱动器中。 You could copy all of the data out of the 3rd parties drive and into your drive, but it would be preferable to read directly from the Public Drive, especially if this is a shared notebook that many people use.可以将所有数据从 3rd 方驱动器复制到您的驱动器中,但最好直接从公共驱动器读取,特别是如果这是许多人使用的共享笔记本。

In this case, you can use PyDrive (see same references).在这种情况下,您可以使用 PyDrive(参见相同的参考资料)。

from pydrive.auth import GoogleAuth
from pydrive.drive import GoogleDrive
from google.colab import auth
from oauth2client.client import GoogleCredentials

# Authenticate and create the PyDrive client.
auth.authenticate_user()
gauth = GoogleAuth()
gauth.credentials = GoogleCredentials.get_application_default()
drive = GoogleDrive(gauth)

You have to look up the drive id for your dataset, and then you can read it, eg, like this:您必须查找数据集的驱动器 ID,然后才能读取它,例如,如下所示:

import pandas as pd
downloaded = drive.CreateFile({'id':id}) 
downloaded.GetContentFile('Filename.csv') 
df = pd.read_csv('Filename.csv') 

In both of these work flows, you must authenticate your Google Account by following a special link, copying a code, and pasting the code back into the notebook.在这两个工作流程中,您必须通过访问特殊链接、复制代码并将代码粘贴回笔记本中来验证您的 Google 帐户。

在此处输入图像描述

Here is my problem:这是我的问题:

I want to do both of these things in the same notebook: (1) read from a mounted google drive and (2) read from a publicly shared GDrive.我想在同一个笔记本上做这两件事:(1)从安装的谷歌驱动器读取和(2)从公开共享的 GDrive 读取。 The user of my notebook is a third party.我的笔记本的用户是第三方。 If the notebook runs both sets of code, then the user is forced to perform the authentication validation code twice.如果笔记本运行两组代码,则用户被迫执行两次身份验证验证代码。 It's a bad UX, and confusing, and seems like it should be unnecessary.这是一个糟糕的用户体验,令人困惑,而且看起来应该是不必要的。

Things I have tried :我尝试过的事情

Regarding this code:关于这段代码:

auth.authenticate_user() # We already authenticated when we mounted our GDrive
gauth = GoogleAuth()

I thought there might be a way to pass the gauth object into the .mount() function so that if credentials already existed, you would not need to re-request authentication with a new verification code.我认为可能有一种方法可以将gauth object 传递到.mount() function 中,这样如果凭据已经存在,则无需使用新的验证码重新请求身份验证。 But I have not been able to find documentation on google.colab.drive.mount() , and guessing randomly at passing parameters is not working out.但是我无法在google.colab.drive.mount()上找到文档,并且在传递参数时随机猜测是行不通的。

Alternatively we could go vice versa, however I am not sure if it is possible to save/extract authentication permissions from .mount() .或者,我们可以 go 反之亦然,但是我不确定是否可以从.mount()保存/提取身份验证权限。

Next I tried running the following code, removing the explicit authenticate_user() call after the mounting had already happened, like this:接下来我尝试运行以下代码,在安装完成后删除显式的authenticate_user()调用,如下所示:

from google.colab import drive as mountGoogleDrive
mountGoogleDrive.mount('/content/mountedDrive')

from pydrive.auth import GoogleAuth
from pydrive.drive import GoogleDrive
from google.colab import auth
from oauth2client.client import GoogleCredentials

# Authenticate and create the PyDrive client.
# auth.authenticate_user() # Commented out, hoping we already authenticated during mounting
gauth = GoogleAuth()
gauth.credentials = GoogleCredentials.get_application_default()
drive = GoogleDrive(gauth)

The first 2 lines run as expected, including the authentication link and verification code.前两行按预期运行,包括认证链接和验证码。 However once we get to the line gauth.credentials = GoogleCredentials.get_application_default() my 3rd party user gets the following error:但是,一旦我们到达gauth.credentials = GoogleCredentials.get_application_default()行,我的第 3 方用户就会收到以下错误:

   1260         # If no credentials, fail.
-> 1261         raise ApplicationDefaultCredentialsError(ADC_HELP_MSG)
   1262 
   1263     @staticmethod

ApplicationDefaultCredentialsError: The Application Default Credentials are not available. They are available if running in Google Compute Engine. Otherwise, the environment variable GOOGLE_APPLICATION_CREDENTIALS must be defined pointing to a file defining the credentials. See https://developers.google.com/accounts/docs/application-default-credentials for more information.

I'm not 100% what these different lines accomplish, so I tried removing the error line as well:我不是 100% 完成了这些不同的行,所以我也尝试删除错误行:

from google.colab import drive as mountGoogleDrive
mountGoogleDrive.mount('/content/mountedDrive')

from pydrive.auth import GoogleAuth
from pydrive.drive import GoogleDrive
from google.colab import auth
from oauth2client.client import GoogleCredentials

# Authenticate and create the PyDrive client.
# auth.authenticate_user() # Commented out, hoping we already authenticated during mounting
gauth = GoogleAuth()
# gauth.credentials = GoogleCredentials.get_application_default() # Commented out, hoping we don't need this line if we are already mounted? 
drive = GoogleDrive(gauth)

This now runs without error, however when I then try to read a file from the public drive I get the following error:现在运行没有错误,但是当我尝试从公共驱动器读取文件时,我收到以下错误:

InvalidConfigError: Invalid client secrets file ('Error opening file', 'client_secrets.json', 'No such file or directory', 2)

At this point I noticed something that is probably important:在这一点上,我注意到一些可能很重要的事情:

When I run the drive-mounting code, the authentication is requesting access to Google DriveFile Stream.当我运行驱动器安装代码时,身份验证请求访问 Google DriveFile Stream。

在此处输入图像描述

When I run the PyDrive authentication, the authentication is requesting access on behalf of Google Cloud SDK.当我运行 PyDrive 身份验证时,身份验证代表 Google Cloud SDK 请求访问。

在此处输入图像描述

So these are different permissions.所以这些是不同的权限。

So, the question is... is there anyway to streamline this and package all of these permissions into a single-verification-code authentication work-flow?所以,问题是......有没有办法将这个和 package 所有这些权限简化为一个单一的验证码身份验证工作流程? If I want to read from both my mounted Drive AND from a publicly-shared GDrive, is it required that the notebook user do double-authentication?如果我想从我安装的驱动器和公共共享的 GDrive 中读取,是否需要笔记本用户进行双重身份验证?

Thanks for any pointers to documentation or examples.感谢您提供任何指向文档或示例的指针。

There is no way to do this.没有办法做到这一点。 The OAuth scope is different, one is for Google Drive file system ; OAuth scope 不同,一个是针对Google Drive 文件系统的; the other is for Google Cloud SDK .另一个用于谷歌云 SDK

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM