简体   繁体   English

谷歌驱动python api:导出永远不会完成。

[英]Google drive python api: export never completes.

Summary:总结:

I have an issue where sometimes a the google-drive-sdk for python does not detect the end of the document being exported.我有一个问题,有时用于 python 的 google-drive-sdk 没有检测到正在导出的文档的结尾。 It seems to think that the google document is of infinite size.似乎认为 google 文档是无限大的。

Background, source code and tutorials I followed:我遵循的背景、源代码和教程:

I am working on my own python based google-drive backup script (one with a nice CLI interface for browsing around).我正在开发我自己的基于 python 的 google-drive 备份脚本(一个带有用于浏览的漂亮 CLI 界面的脚本)。 git link for source code源代码的 git 链接

Its still in the making and currently only finds new files and downloads them (with 'pull' command).它仍在制作中,目前只能找到新文件并下载它们(使用“pull”命令)。

To do the most important google-drive commands, I followed the official google drive api tutorials for downloading media.为了执行最重要的 google-drive 命令,我按照官方 google drive api 教程下载媒体。 here 这里

What works:什么工作:

When a document or file is a non-google-docs document, the file is downloaded properly.当文档或文件是非 google-docs 文档时,该文件会被正确下载。 However, when I try to "export" a file.但是,当我尝试“导出”文件时。 I see that I need to use a different mimeType.我发现我需要使用不同的 mimeType。 I have a dictionary for this.我有一本字典。

For example: I map application/vnd.google-apps.document to application/vnd.openxmlformats-officedocument.wordprocessingml.document when exporting a document.例如:我在导出文档时将application/vnd.google-apps.documentapplication/vnd.openxmlformats-officedocument.wordprocessingml.document

When downloading google documents documents from google drive, this seems to work fine.从谷歌驱动器下载谷歌文档时,这似乎工作正常。 By this I mean: my while loop with the code status, done = downloader.next_chunk() will eventual set done to true and the download completes.我的意思是:我的 while 循环代码status, done = downloader.next_chunk()最终将done设置为true并且下载完成。

What does not work:什么不起作用:

However, on some files, the done flag never gets to true and script will download forever.但是,在某些文件上, done标志永远不会变为true ,脚本将永远下载。 This eventually amounts to several Gb.这最终达到几个 Gb。 Perhaps I am looking for the wrong flag that says the file is complete when doing an export.也许我正在寻找错误的标志,表明在进行导出时文件已完成。 I am surprised that google-drive never throws an error.我很惊讶 google-drive 永远不会抛出错误。 Anybody know what could cause this?有谁知道这可能导致什么?

Current status当前状态

For now I have exporting of google documents disabled in my code.现在我在我的代码中禁用了谷歌文档的导出。

When I use scripts like " drive by rakyll " (at least the version I have) just puts a link to the online copy.当我使用诸如“由 rakyll 驱动”(至少是我拥有的版本)之类的脚本时,只会放置一个指向在线副本的链接。 I would really like to do a proper export so that my offline system can maintain a complete backup of everything on drive.我真的很想进行适当的导出,以便我的离线系统可以维护驱动器上所有内容的完整备份。

Ps It's fine to put "you should use this service instead of the api" for the sake of others finding this page. Ps 为了其他人找到这个页面,写“你应该使用这个服务而不是api”是可以的。 I know that there are other services out there for this, but I'm really looking to explore the drive-api functions for integration with my own other systems.我知道还有其他服务可用于此目的,但我真的很想探索驱动器 API 功能以与我自己的其他系统集成。

OK.好的。 I found a pseudo solution here.我在这里找到了一个伪解决方案。

The problem is that the Google API never returns the Content-Length and the response is done in Chunks.问题是 Google API 从不返回 Content-Length 并且响应是在 Chunks 中完成的。 However, either the chunk returned is wrong, or the Python API is not able to process it correctly.但是,要么返回的块是错误的,要么 Python API 无法正确处理它。

What I did was, grab the code for the MediaIoBaseDownload from here我所做的是, 从这里获取MediaIoBaseDownload的代码

I left all the same, but changed this part:我保持不变,但改变了这一部分:

if 'content-range' in resp:
    content_range = resp['content-range']
    length = content_range.rsplit('/', 1)[1]
    self._total_size = int(length)
elif 'content-length' in resp:
    self._total_size = int(resp['content-length'])
else:
    # PSEUDO BUG FIX: No content-length, no chunk info, cut the response here.
    self._total_size = self._progress 

The else at the end is what I've added.最后的else是我添加的。 I've also changed the default chunk size by setting DEFAULT_CHUNK_SIZE = 2*1024*1024 .我还通过设置DEFAULT_CHUNK_SIZE = 2*1024*1024更改了默认块大小。 Also you will have to copy a few imports from that file, including this one from googleapiclient.http import _retry_request, _should_retry_response此外,您还必须从该文件中复制一些导入,包括from googleapiclient.http import _retry_request, _should_retry_response

Of course this is not a solution, it just says " if I don't understand the response, just stop it here ".当然,这不是解决方案,它只是说“如果我不理解响应,请在此处停止”。 This will probably make some exports not work, but at least it doesn't kill the server.这可能会使某些导出不起作用,但至少它不会杀死服务器。 This is only until we can find a good solution.这只是在我们找到一个好的解决方案之前。

UPDATE:更新:

Bug is already reported here: https://github.com/google/google-api-python-client/issues/15这里已经报告了错误: https : //github.com/google/google-api-python-client/issues/15

and as of January 2017, the only workaround is to not use MediaIoBaseDownload and do this instead (not suitable to large files):截至 2017 年 1 月,唯一的解决方法是不使用MediaIoBaseDownload而是执行此操作(不适用于大文件):

req = service.files().export(fileId=file_id, mimeType=mimeType)
resp = req.execute(http=http)

I'm using this and it's works with the following library:我正在使用它,它适用于以下库:

google-auth-oauthlib==0.4.1
google-api-python-client
google-auth-httplib2

This is the snippet I'm using:这是我正在使用的片段:

from apiclient import errors
from googleapiclient.http import MediaIoBaseDownload
from googleapiclient.discovery import build

def download_google_document_from_drive(self, file_id):
    try:

        request = self.service.files().get_media(fileId=file_id)
        fh = io.BytesIO()
        downloader = MediaIoBaseDownload(fh, request)
        done = False
        while done is False:
            status, done = downloader.next_chunk()
            print('Download %d%%.' % int(status.progress() * 100))
        return fh
    except Exception as e:
        print('Error downloading file from Google Drive: %s' % e)

You can write the file stream to a file:您可以将文件流写入文件:

import xlrd
workbook = xlrd.open_workbook(file_contents=fh.getvalue())

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM