![](/img/trans.png)
[英]Google Cloud Dataflow can't import 'google.cloud.datastore'
[英]CloudDataflow can not use “google.cloud.datastore” package?
我想在CloudDataflow上放置带有事务的数据存储区。 所以,我在下面写道。
def exe_dataflow():
....
from google.cloud import datastore
# call from pipeline
def ds_test(content):
datastore_client = datastore.Client()
kind = 'test_out'
name = 'change'
task_key = datastore_client.key(kind, name)
for _ in range(3):
with datastore_client.transaction():
current_value = client.get(task_key)
current_value['v'] += content['v']
datastore_client.put(task)
# pipeline
....
| 'datastore test' >> beam.Map(ds_test)
但是,出现错误并且日志消息显示如下。
(7b75e0ef2db229da): Traceback (most recent call last):
File "/usr/local/lib/python2.7/dist-packages/dataflow_worker/batchworker.py", line 582, in do_work
work_executor.execute()
File "/usr/local/lib/python2.7/dist-packages/dataflow_worker/executor.py", line 167, in execute
op.start()
...(SNIP)...
File "/usr/local/lib/python2.7/dist-packages/dill/dill.py", line 767, in _import_module
return getattr(__import__(module, None, None, [obj]), obj)
AttributeError: 'module' object has no attribute 'datastore'
CloudDataflow不能使用“google.cloud.datastore”包吗?
加2018/2/28。
我将--requirements_file添加到MyOption
options = MyOptions(flags = ["--requirements_file", "./requirements.txt"])
我做了requirements.txt
google-cloud-datastore==1.5.0
但是,发生了另一个错误。
(366397598dcf7f02): Traceback (most recent call last):
File "/usr/local/lib/python2.7/dist-packages/dataflow_worker/batchworker.py", line 582, in do_work
work_executor.execute()
File "/usr/local/lib/python2.7/dist-packages/dataflow_worker/executor.py", line 167, in execute
op.start()
...(SNIP)...
File "my_dataflow.py", line 66, in to_entity
File "/usr/local/lib/python2.7/dist-packages/google/cloud/datastore/__init__.py", line 60, in <module>
from google.cloud.datastore.batch import Batch
File "/usr/local/lib/python2.7/dist-packages/google/cloud/datastore/batch.py", line 24, in <module>
from google.cloud.datastore import helpers
File "/usr/local/lib/python2.7/dist-packages/google/cloud/datastore/helpers.py", line 29, in <module>
from google.cloud.datastore_v1.proto import datastore_pb2
File "/usr/local/lib/python2.7/dist-packages/google/cloud/datastore_v1/__init__.py", line 17, in <module>
from google.cloud.datastore_v1 import types
File "/usr/local/lib/python2.7/dist-packages/google/cloud/datastore_v1/types.py", line 21, in <module>
from google.cloud.datastore_v1.proto import datastore_pb2
File "/usr/local/lib/python2.7/dist-packages/google/cloud/datastore_v1/proto/datastore_pb2.py", line 17, in <module>
from google.cloud.datastore_v1.proto import entity_pb2 as google_dot_cloud_dot_datastore__v1_dot_proto_dot_entity__pb2
File "/usr/local/lib/python2.7/dist-packages/google/cloud/datastore_v1/proto/entity_pb2.py", line 28, in <module>
dependencies=[google_dot_api_dot_annotations__pb2.DESCRIPTOR,google_dot_protobuf_dot_struct__pb2.DESCRIPTOR,google_dot_protobuf_dot_timestamp__pb2.DESCRIPTOR,google_dot_type_dot_latlng__pb2.DESCRIPTOR,])
File "/usr/local/lib/python2.7/dist-packages/google/protobuf/descriptor.py", line 824, in __new__
return _message.default_pool.AddSerializedFile(serialized_pb)
TypeError: Couldn't build proto file into descriptor pool!
Invalid proto descriptor for file "google/cloud/datastore_v1/proto/entity.proto":
google.datastore.v1.PartitionId.project_id: "google.datastore.v1.PartitionId.project_id" is already defined in file "google/cloud/proto/datastore/v1/entity.proto".
...(SNIP)...
google.datastore.v1.Entity.properties: "google.datastore.v1.Entity.PropertiesEntry" seems to be defined in "google/cloud/proto/datastore/v1/entity.proto", which is not imported by "google/cloud/datastore_v1/proto/entity.proto". To use it here, please add the necessary import.
从Cloud Dataflow Pipeline与Cloud Datastore交互的推荐方法是使用Datastore I / O API,该API可通过Dataflow SDK获得,并提供一些方法来读取和写入数据到Cloud Datoreore数据库。
您可以在此链接中找到Dataflow SDK 2.x for Python的数据存储I / O包的详细文档。 datastore.v1.datastoreio
模块是您要使用的特定模块。 我共享的链接中有大量信息,但简而言之,它是数据存储区的连接器,它使用PTransform
使用类ReadFromDatastore()
/ WriteToDatastore()
/ DeleteFromDatastore()
从数据存储区读取 / 写入 / 删除 PCollection
。
您应该尝试使用它而不是自己实现调用。 我怀疑这可能是您看到错误的原因,因为Dataflow SDK中已存在数据存储区实现:
"google.datastore.v1.PartitionId.project_id" is already defined in file "google/cloud/proto/datastore/v1/entity.proto".
更新:
看起来这三个类收集了几个突变并在单个事务中执行它们。 您可以在描述类的代码中检查它。
如果目标是检索( get()
)然后更新( put()
)数据存储区实体,则可以使用文档中描述的write_mutations()
函数 ,并且可以使用完整批处理执行您感兴趣的操作的突变 。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.