简体   繁体   English

Google ml-engine云存储作为文件

[英]google ml-engine cloud storage as a file

I am working in Python with Google Cloud ML-Engine. 我正在使用Google Cloud ML-Engine使用Python。 The documentation I have found indicates that data storage should be done with Buckets and Blobs 我发现的文档表明数据存储应使用Bucket和Blob完成

https://cloud.google.com/ml-engine/docs/tensorflow/working-with-cloud-storage https://cloud.google.com/ml-engine/docs/tensorflow/working-with-cloud-storage

However, much of my code, and the libraries it calls works with files. 但是,我的许多代码以及它调用的库都可以处理文件。 Can I somehow treat Google Storage as a file system in my ml-engine code? 我可以将我的ml引擎代码中的Google Storage视为文件系统吗?

I want my code to read like 我希望我的代码看起来像

with open(<something>) as f:
   for line in f:
      dosomething(line)

Note that in ml-engine one does not create and configure VM instances. 请注意,在ml-engine中,不会创建和配置VM实例。 So I can not mount my own shared filesystem with Filestore. 因此,我无法使用Filestore挂载自己的共享文件系统。

The only way to have Cloud Storage appear as a filesystem is to mount a bucket as a file system : 使Cloud Storage成为文件系统的唯一方法是将存储桶安装为文件系统

You can use the Google Cloud Storage FUSE tool to mount a Cloud Storage bucket to your Compute Engine instance. 您可以使用Google Cloud Storage FUSE工具将Cloud Storage存储桶安装到Compute Engine实例。 The mounted bucket behaves similarly to a persistent disk even though Cloud Storage buckets are object storage. 即使Cloud Storage存储桶是对象存储,安装的存储桶的行为也与持久磁盘类似。

But you cannot do that if you can't create and configure VMs. 但是,如果无法创建和配置VM,则无法执行此操作。

Note that in ml-engine one does not create and configure VM instances. 请注意,在ml-engine中,不会创建和配置VM实例。

That's not entirely true. 并非完全如此。 I see ML Engine supports building custom containers , which is typically how one can install and configure OS-level dependencies. 我看到ML Engine支持构建自定义容器 ,通常这是一种可以安装和配置OS级依赖关系的方式。 But only for the training area, so if your needs are in that area it may be worth a try. 但仅适用于培训领域,因此,如果您的需求在该领域内,那么值得尝试。

I assume you already checked that the library doesn't support access through an already open file-like handler (if not then maybe of interest would be How to restore Tensorflow model from Google bucket without writing to filesystem? ) 我假设您已经检查过该库不支持通过已经打开的类似文件的处理程序进行访问(如果不支持,那么可能感兴趣的是如何从Google存储桶中还原Tensorflow模型而不写入文件系统?

For those that come after, here is the answer 对于那些之后的人,这是答案

Google Cloud ML and GCS Bucket issues Google Cloud ML和GCS Bucket问题

from tensorflow.python.lib.io import file_io

Here is an example 这是一个例子

with file_io.FileIO("gc://bucket_name/foobar.txt","w") as f:
    f.write("FOO")
    f.flush()
    print("Write foobar.txt")

with file_io.FileIO("gc://bucket_name/foobar.txt","r") as f:
    for line in f:
        print("Read foobar.txt: "+line)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM