简体   繁体   English

Google App Engine:如何将大型文件写入Google云端存储

[英]Google App Engine: How to write large files to Google Cloud Storage

I am trying to save large files from Google App Engine's Blobstore to Google Cloud Storage to facilitate backup. 我正在尝试将大型文件从Google App Engine的Blobstore保存到Google云端存储,以方便备份。

It works fine for small files (<10 mb) but for larger files it get gets unstable and GAE throws and FileNotOpenedError. 它适用于小文件(<10 mb),但对于较大的文件,它会变得不稳定,GAE抛出和FileNotOpenedError。

My code: 我的代码:

PATH = '/gs/backupbucket/'
for df in DocumentFile.all():           
  fn = df.blob.filename
  br = blobstore.BlobReader(df.blob)
  write_path = files.gs.create(self.PATH+fn.encode('utf-8'), mime_type='application/zip',acl='project-private') 
  with files.open(write_path, 'a') as fp:
    while True:
      buf = br.read(100000)
      if buf=="": break
      fp.write(buf)
  files.finalize(write_path)

(Runs in a taskeque to avoid exceeding execution time). (在一个taskeque中运行,以避免超过执行时间)。

Throws a FileNotOpenedError: 抛出FileNotOpenedError:

Traceback (most recent call last):
  File "/base/python27_runtime/python27_lib/versions/third_party/webapp2-2.3/webapp2.py", line 1511, in __call__
    rv = self.handle_exception(request, response, e)
  File "/base/python27_runtime/python27_lib/versions/third_party/webapp2-2.3/webapp2.py", line 1505, in __call__
    rv = self.router.dispatch(request, response)
  File "/base/python27_runtime/python27_lib/versions/third_party/webapp2-2.3/webapp2.py", line 1253, in default_dispatcher
    return route.handler_adapter(request, response)
  File "/base/python27_runtime/python27_lib/versions/third_party/webapp2-2.3/webapp2.py", line 1077, in __call__
    return handler.dispatch()
  File "/base/python27_runtime/python27_lib/versions/third_party/webapp2-2.3/webapp2.py", line 547, in dispatch
    return self.handle_exception(e, self.app.debug)
  File "/base/python27_runtime/python27_lib/versions/third_party/webapp2-2.3/webapp2.py", line 545, in dispatch
    return method(*args, **kwargs)
  File "/base/data/home/apps/s~simplerepository/1.354754771592783168/processFiles.py", line 249, in post
    fp.write(buf)
  File "/base/python27_runtime/python27_lib/versions/1/google/appengine/api/files/file.py", line 281, in __exit__
    self.close()
  File "/base/python27_runtime/python27_lib/versions/1/google/appengine/api/files/file.py", line 275, in close
    self._make_rpc_call_with_retry('Close', request, response)
  File "/base/python27_runtime/python27_lib/versions/1/google/appengine/api/files/file.py", line 388, in _make_rpc_call_with_retry
    _make_call(method, request, response)
  File "/base/python27_runtime/python27_lib/versions/1/google/appengine/api/files/file.py", line 236, in _make_call
    _raise_app_error(e)
  File "/base/python27_runtime/python27_lib/versions/1/google/appengine/api/files/file.py", line 179, in _raise_app_error
    raise FileNotOpenedError()

I have investigated further and according to a comment to GAE Issue 5371 the Files API closes the file every 30 seconds. 我进一步调查,根据对GAE问题5371的评论,Files API每30秒关闭一次文件。 I have not seen this documented anywhere else. 我没有在其他任何地方看到这个记录。

I have tried to work around this by closing and opening the file at intervals but now I get an WrongOpenModeError. 我试图通过间隔关闭和打开文件来解决这个问题,但现在我得到了一个WrongOpenModeError。 The code below is edited from the first version of this post I have added a 0.5 second pause between the close and the open of the file. 下面的代码是从这篇文章的第一个版本编辑的,我在文件的关闭和打开之间添加了0.5秒的暂停。 It now throws a WrongOpenModeError. 它现在抛出一个WrongOpenModeError。

My code (updated): 我的代码(更新):

PATH = '/gs/backupbucket/'
for df in DocumentFile.all():           
  fn = df.blob.filename
  br = blobstore.BlobReader(df.blob)
  write_path = files.gs.create(self.PATH+fn.encode('utf-8'), mime_type='application/zip',acl='project-private') 
  fp = files.open(write_path, 'a')
  c = 0
  while True:       
    if (c == 5):
      c = 0
      fp.close()
      files.finalize(write_path)
      time.sleep(0.5)
      fp = files.open(write_path, 'a')
    c = c + 1
    buf = br.read(100000)
    if buf=="": break
    fp.write(buf)
  files.finalize(write_path)

Stacktrace: 堆栈跟踪:

Traceback (most recent call last):
  File "/base/python27_runtime/python27_lib/versions/third_party/webapp2-2.3/webapp2.py", line 1511, in __call__
    rv = self.handle_exception(request, response, e)
  File "/base/python27_runtime/python27_lib/versions/third_party/webapp2-2.3/webapp2.py", line 1505, in __call__
    rv = self.router.dispatch(request, response)
  File "/base/python27_runtime/python27_lib/versions/third_party/webapp2-2.3/webapp2.py", line 1253, in default_dispatcher
    return route.handler_adapter(request, response)
  File "/base/python27_runtime/python27_lib/versions/third_party/webapp2-2.3/webapp2.py", line 1077, in __call__
    return handler.dispatch()
  File "/base/python27_runtime/python27_lib/versions/third_party/webapp2-2.3/webapp2.py", line 547, in dispatch
    return self.handle_exception(e, self.app.debug)
  File "/base/python27_runtime/python27_lib/versions/third_party/webapp2-2.3/webapp2.py", line 545, in dispatch
    return method(*args, **kwargs)
  File "/base/data/home/apps/s~simplerepository/1.354894420907462278/processFiles.py", line 267, in get
    fp.write(buf)
  File "/base/python27_runtime/python27_lib/versions/1/google/appengine/api/files/file.py", line 310, in write
    self._make_rpc_call_with_retry('Append', request, response)
  File "/base/python27_runtime/python27_lib/versions/1/google/appengine/api/files/file.py", line 388, in _make_rpc_call_with_retry
    _make_call(method, request, response)
  File "/base/python27_runtime/python27_lib/versions/1/google/appengine/api/files/file.py", line 236, in _make_call
    _raise_app_error(e)
  File "/base/python27_runtime/python27_lib/versions/1/google/appengine/api/files/file.py", line 188, in _raise_app_error
    raise WrongOpenModeError()

I have tried to find information about the WrongOpenModeError but the only place it is mentioned is in the appengine.api.files.file.py itself. 我试图找到有关WrongOpenModeError的信息,但它唯一提到的地方是appengine.api.files.file.py本身。

Suggestions on how to get around this and be able to save also large files to Google Cloud storage would be greatly appreciated. 关于如何解决这个问题以及如何将大型文件保存到Google云端存储的建议将不胜感激。 Thanks! 谢谢!

IMO你应该files.finalize(write_path)文件files.finalize(write_path) write_path files.finalize(write_path)间隔,finalize使文件可读,你不能再将它改为可写。

I was having the same issue, endup writing an iterator around fetch data and catch the exception, works but is a work-around. 我遇到了同样的问题,最终编写了一个围绕获取数据的迭代器并捕获异常,但是可以解决这个问题。

Re-writing your code would be something like: 重写代码将是这样的:

from google.appengine.ext import blobstore
from google.appengine.api import files

def iter_blobstore(blob, fetch_size=524288):
  start_index = 0
  end_index = fetch_size

  while True:
    read = blobstore.fetch_data(blob, start_index, end_index)

    if read == "":
      break

    start_index += fetch_size
    end_index += fetch_size

    yield read


PATH = '/gs/backupbucket/'
for df in DocumentFile.all():           
  fn = df.blob.filename
  br = blobstore.BlobReader(df.blob)
  write_path = files.gs.create(self.PATH+fn.encode('utf-8'), mime_type='application/zip',acl='project-private') 
  with files.open(write_path, 'a') as fp:
    for buf in iter_blobstore(df.blob):
      try:
        fp.write(buf)
      except files.FileNotOpenedError:
        pass
  files.finalize(write_path)

Is backends an option you can choose? 后端是您可以选择的选项吗? That will run in background and has much greater power than TaskQueue. 这将在后台运行,并且比TaskQueue具有更大的功能。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM