简体   繁体   English

Python:使用 url 从谷歌驱动器下载文件

[英]Python: download files from google drive using url

I am trying to download files from google drive and all I have is the drive's URL.我正在尝试从谷歌驱动器下载文件,我所拥有的只是驱动器的 URL。

I have read about google API that talks about some drive_service and MedioIO , which also requires some credentials( mainly JSON file/OAuth ).我读过关于 google API 的内容,它讨论了一些drive_serviceMedioIO ,这也需要一些凭据(主要是 JSON file/OAuth )。 But I am unable to get any idea about how it is working.但我无法了解它是如何工作的。

Also, tried urllib2.urlretrieve , but my case is to get files from the drive.此外,尝试urllib2.urlretrieve ,但我的情况是从驱动器中获取文件。 Tried wget too but no use.也试过wget但没有用。

Tried PyDrive library.试过PyDrive库。 It has good upload functions to drive but no download options.它具有良好的驱动上传功能,但没有下载选项。

Any help will be appreciated.任何帮助将不胜感激。 Thanks.谢谢。

If by "drive's url" you mean the shareable link of a file on Google Drive, then the following might help:如果“驱动器的 url”是指 Google Drive 上文件的可共享链接,那么以下内容可能会有所帮助:

import requests

def download_file_from_google_drive(id, destination):
    URL = "https://docs.google.com/uc?export=download"

    session = requests.Session()

    response = session.get(URL, params = { 'id' : id }, stream = True)
    token = get_confirm_token(response)

    if token:
        params = { 'id' : id, 'confirm' : token }
        response = session.get(URL, params = params, stream = True)

    save_response_content(response, destination)    

def get_confirm_token(response):
    for key, value in response.cookies.items():
        if key.startswith('download_warning'):
            return value

    return None

def save_response_content(response, destination):
    CHUNK_SIZE = 32768

    with open(destination, "wb") as f:
        for chunk in response.iter_content(CHUNK_SIZE):
            if chunk: # filter out keep-alive new chunks
                f.write(chunk)

if __name__ == "__main__":
    file_id = 'TAKE ID FROM SHAREABLE LINK'
    destination = 'DESTINATION FILE ON YOUR DISK'
    download_file_from_google_drive(file_id, destination)

The snipped does not use pydrive , nor the Google Drive SDK, though.不过,被剪断的文件不使用pydrive ,也不使用 Google Drive SDK。 It uses the requests module (which is, somehow, an alternative to urllib2 ).它使用requests模块(不知何故,它是urllib2的替代品)。

When downloading large files from Google Drive, a single GET request is not sufficient.从 Google Drive 下载大文件时,单个 GET 请求是不够的。 A second one is needed - see wget/curl large file from google drive .需要第二个 - 请参阅来自 google drive 的 wget/curl 大文件

Having had similar needs many times, I made an extra simple class GoogleDriveDownloader starting on the snippet from @user115202 above.多次有类似需求后,我从上面来自 @user115202 的代码段开始创建了一个额外的简单类GoogleDriveDownloader You can find the source code here .您可以在此处找到源代码。

You can also install it through pip:也可以通过pip安装:

pip install googledrivedownloader

Then usage is as simple as:那么用法很简单:

from google_drive_downloader import GoogleDriveDownloader as gdd

gdd.download_file_from_google_drive(file_id='1iytA1n2z4go3uVCwE__vIKouTKyIDjEq',
                                    dest_path='./data/mnist.zip',
                                    unzip=True)

This snippet will download an archive shared in Google Drive.此代码段将下载在 Google 云端硬盘中共享的存档。 In this case 1iytA1n2z4go3uVCwE__vIKouTKyIDjEq is the id of the sharable link got from Google Drive.在这种情况下, 1iytA1n2z4go3uVCwE__vIKouTKyIDjEq是从 Google Drive 获取的可共享链接的 ID。

I recommend gdown package:我推荐gdown包:

Take your share link获取您的分享链接

https://drive.google.com/file/d/0B9P1L--7Wd2vNm9zMTJWOGxobkU/view?usp=sharing https://drive.google.com/file/d/0B9P1L--7Wd2vNm9zMTJWOGxobkU/view?usp=sharing

and grab the id - eg.并获取 id - 例如。 1TLNdIufzwesDbyr_nVTR7Zrx9oRHLM_N / and swap it in after the id below. 1TLNdIufzwesDbyr_nVTR7Zrx9oRHLM_N / 并在下面的 id 之后将其交换。

import gdown

url = 'https://drive.google.com/uc?id=0B9P1L--7Wd2vNm9zMTJWOGxobkU'
output = '20150428_collected_images.tgz'
gdown.download(url, output, quiet=False)

PyDrive allows you to download a file with the function GetContentFile() . PyDrive允许您使用函数GetContentFile()下载文件。 You can find the function's documentation here .您可以在此处找到该函数的文档。

See example below:请参阅下面的示例:

# Initialize GoogleDriveFile instance with file id.
file_obj = drive.CreateFile({'id': '<your file ID here>'})
file_obj.GetContentFile('cats.png') # Download file as 'cats.png'.

This code assumes that you have an authenticated drive object, the docs on this can be found here and here .此代码假定您有一个经过身份验证的drive对象,可以在此处此处找到有关此的文档。

In the general case this is done like so:在一般情况下,这是这样做的:

from pydrive.auth import GoogleAuth

gauth = GoogleAuth()
# Create local webserver which automatically handles authentication.
gauth.LocalWebserverAuth()

# Create GoogleDrive instance with authenticated GoogleAuth instance.
drive = GoogleDrive(gauth)

Info on silent authentication on a server can be found here and involves writing a settings.yaml (example: here ) in which you save the authentication details.可以在此处找到有关服务器上的静默身份验证的信息,并涉及编写settings.yaml (示例: 此处),您可以在其中保存身份验证详细信息。

Here's an easy way to do it with no third-party libraries and a service account.这是一种无需第三方库和服务帐户的简单方法。

pip install google-api-core and google-api-python-client pip 安装google-api-coregoogle-api-python-client

from googleapiclient.discovery import build
from googleapiclient.http import MediaIoBaseDownload
from google.oauth2 import service_account
import io

credz = {} #put json credentials her from service account or the like
# More info: https://cloud.google.com/docs/authentication

credentials = service_account.Credentials.from_service_account_info(credz)
drive_service = build('drive', 'v3', credentials=credentials)

file_id = '0BwwA4oUTeiV1UVNwOHItT0xfa2M'
request = drive_service.files().get_media(fileId=file_id)
#fh = io.BytesIO() # this can be used to keep in memory
fh = io.FileIO('file.tar.gz', 'wb') # this can be used to write to disk
downloader = MediaIoBaseDownload(fh, request)
done = False
while done is False:
    status, done = downloader.next_chunk()
    print("Download %d%%." % int(status.progress() * 100))


Generally speaking, a URL from a shared file from Google Drive looks like this一般来说,来自 Google Drive 的共享文件的 URL 如下所示

https://drive.google.com/file/d/1HV6vf8pB-EYnjcJcH65eGZVMa2v2tcMh/view?usp=sharing

where 1HV6vf8pB-EYnjcJcH65eGZVMa2v2tcMh corresponds to fileID.其中1HV6vf8pB-EYnjcJcH65eGZVMa2v2tcMh对应于文件 ID。

Hence, you can simply create a function to get the fileID from the URL, like this where url = https://drive.google.com/file/d/1HV6vf8pB-EYnjcJcH65eGZVMa2v2tcMh/view?usp=sharing ,因此,您可以简单地创建一个函数来从 URL 获取文件 ID,如下所示,其中url = https://drive.google.com/file/d/1HV6vf8pB-EYnjcJcH65eGZVMa2v2tcMh/view?usp=sharing

def url_to_id(url):
    x = url.split("/")
    return x[5]

Printing x will give打印 x 会给

['https:', '', 'drive.google.com', 'file', 'd', '1HV6vf8pB-EYnjcJcH65eGZVMa2v2tcMh', 'view?usp=sharing']

And so, as we want to return the 6th array value, we use x[5] .因此,当我们想要返回第 6 个数组值时,我们使用x[5]

This has also been described above,这也已经在上面描述过,

   from pydrive.auth import GoogleAuth
   gauth = GoogleAuth()
   gauth.LocalWebserverAuth()
   drive = GoogleDrive(gauth)

This creates its own server too do the dirty work of authenticating这会创建自己的服务器,也可以完成身份验证的肮脏工作

   file_obj = drive.CreateFile({'id': '<Put the file ID here>'})
   file_obj.GetContentFile('Demo.txt') 

This downloads the file这将下载文件

# Importing [PyDrive][1] OAuth
from pydrive.auth import GoogleAuth

def download_tracking_file_by_id(file_id, download_dir):
    gauth = GoogleAuth(settings_file='../settings.yaml')
    # Try to load saved client credentials
    gauth.LoadCredentialsFile("../credentials.json")
    if gauth.credentials is None:
        # Authenticate if they're not there
        gauth.LocalWebserverAuth()
    elif gauth.access_token_expired:
        # Refresh them if expired
        gauth.Refresh()
    else:
        # Initialize the saved creds
        gauth.Authorize()
    # Save the current credentials to a file
    gauth.SaveCredentialsFile("../credentials.json")

    drive = GoogleDrive(gauth)

    logger.debug("Trying to download file_id " + str(file_id))
    file6 = drive.CreateFile({'id': file_id})
    file6.GetContentFile(download_dir+'mapmob.zip')
    zipfile.ZipFile(download_dir + 'test.zip').extractall(UNZIP_DIR)
    tracking_data_location = download_dir + 'test.json'
    return tracking_data_location

The above function downloads the file given the file_id to a specified downloads folder.上述函数将给定 file_id 的文件下载到指定的下载文件夹。 Now the question remains, how to get the file_id?现在问题仍然存在,如何获取file_id? Simply split the url by id= to get the file_id.只需通过 id= 拆分 url 即可获得 file_id。

file_id = url.split("id=")[1]

You can install https://pypi.org/project/googleDriveFileDownloader/您可以安装https://pypi.org/project/googleDriveFileDownloader/

pip install googleDriveFileDownloader

And download the file, here is the sample code to download并下载文件,这里是下载的示例代码

from googleDriveFileDownloader import googleDriveFileDownloader
a = googleDriveFileDownloader()
a.downloadFile("https://drive.google.com/uc?id=1O4x8rwGJAh8gRo8sjm0kuKFf6vCEm93G&export=download")

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 使用 Python 从 Google Drive 下载文件 - Download Files From Google Drive Using Python 使用 Python 将 Google Drive 文件下载到特定位置 - Download Google Drive files to a specific location using Python Python - 作为计划任务从 Google Drive 下载文件 - Python - Download files from Google Drive as a scheduled task Python:从谷歌驱动器中查找并下载丢失的文件(使用共享链接) - Python: find and download missing files from google drive (using a sharable link) 如何使用 python 从 Google 云端硬盘仅下载不在您计算机中的文件 - How to download only files that are not in your computer from Google Drive using python 如何使用 python 以编程方式从谷歌驱动器下载特定文件 - How can I download specific files from google drive programmatically using python 如何使用 For Loop 通过 API 从 Google Drive 下载文件 - How to download files from Google Drive using For Loop through API Google Drive API:如何从Google Drive下载文件? - Google Drive API:How to download files from google drive? 从谷歌共享驱动器下载文件 - Download files from the google shared drive 使用PyDrive管理来自公共Google云端硬盘URL的文件 - Manage files from public Google Drive URL using PyDrive
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM