简体   繁体   English

使用 Google Cloud 读取和写入泡菜

[英]Reading and writing pickles using Google Cloud

I want to read an existing pickle (that contains a dictionary) which is stored in a folder inside a Google Cloud Bucket.我想读取存储在 Google Cloud Bucket 内文件夹中的现有泡菜(包含字典)。 Then update the pickle after performing some functions, which is equal to overwriting the pickle.然后在执行完一些功能后更新pickle,相当于覆盖了pickle。

Traditionally I would do something like:传统上我会做这样的事情:

import pickle
# Read pickle:
pickle_in = open('dictionary.pickle','rb')
my_dictionary = pickle.load(pickle_in)
my_dictionary 

# MODIFY DICTIONARY BY, FOR EXAMPLE, ADDING NEW REGISTERS

# Overwrite pickle:
pickle_out = open('dictionary.pickle','wb') 
pickle.dump(my_modified_dictionary,pickle_out)
pickle_out.close()

Now I need to do something similar but on Google Cloud.现在我需要在 Google Cloud 上做类似的事情。 So I need to change the path of the file and use cloudstorage.open():所以我需要改变文件的路径并使用cloudstorage.open():

import pickle
my_path = '/bucket_name/pickle_folder/my_dictionary.pickle'

# Read pickle:
pickle_in = cloudstorage.open(path,'r')
my_dictionary = pickle.load(pickle_in)
my_dictionary 

# MODIFY DICTIONARY BY, FOR EXAMPLE, ADDING NEW REGISTERS

# Overwrite pickle:
pickle_out = cloudstorage.open(path,'w') 
pickle.dump(my_modified_dictionary,pickle_out)
pickle_out.close()

Will this work?这会起作用吗? cloudstorage.open() seems to be the equivalent to open(). cloudtorage.open()似乎等同于 open()。 But I am not sure that if I specify the path when dumping the pickle will actually overwrite the pickle on the specified folder.但我不确定如果我在转储泡菜时指定路径实际上会覆盖指定文件夹上的泡菜。

The basic idea of doing read-modify-write from GCS is possible.从 GCS 进行读-修改-写的基本思想是可能的。 You should be aware that this will not work well with concurrency - if a second process does a read before the the first writes back, then when the second process writes back it will lose the first process's changes.您应该知道这在并发情况下不能很好地工作 - 如果第二个进程在第一个写回之前进行读取,那么当第二个进程写回时,它将丢失第一个进程的更改。 The best solution to this is to use a database rather than pickling to GCS.对此的最佳解决方案是使用数据库而不是酸洗到 GCS。

In addition, be aware that pickle is not secure , and you should not load pickles you didn't write.此外,请注意pickle 不安全,您不应该加载不是您编写的泡菜。

If you do still want to use GCS for this you should use the standard GCS client library , something like:如果您仍然想为此使用 GCS,则应使用标准的 GCS 客户端库,例如:

from google.cloud import storage

storage_client = storage.Client()

bucket = storage_client.bucket('your-gcs-bucket')
blob = bucket.blob('dictionary.pickle')
pickle_in = blob.download_as_string()
my_dictionary = pickle.loads(pickle_in)

# MODIFY DICTIONARY BY, FOR EXAMPLE, ADDING NEW REGISTERS

pickle_out = pickle.dumps(my_modified_dictionary)
blob.upload_from_string(pickle_out)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM