[英]Google Cloud Dataflow can't import 'google.cloud.datastore'
This is my import code这是我的导入代码
from __future__ import absolute_import
import datetime
import json
import logging
import re
import apache_beam as beam
from apache_beam import combiners
from apache_beam.io.gcp.bigquery import parse_table_schema_from_json
from apache_beam.io.gcp.datastore.v1.datastoreio import ReadFromDatastore
from apache_beam.pvalue import AsDict
from apache_beam.pvalue import AsSingleton
from apache_beam.options.pipeline_options import PipelineOptions
from google.cloud.proto.datastore.v1 import query_pb2
from google.cloud import datastore
from googledatastore import helper as datastore_helper, PropertyFilter
# datastore entities that we need to perform the mapping computations
#from models import UserPlan, UploadIntervalCount, RollingMonthlyCount
This is what my requirements.txt file looks like这就是我的 requirements.txt 文件的样子
$ cat requirements.txt
Flask==0.12.2
apache-beam[gcp]==2.1.1
gunicorn==19.7.1
google-cloud-dataflow==2.1.1
six==1.10.0
google-cloud-datastore==1.3.0
google-cloud
This is all in the /lib
directory.这一切都在
/lib
目录中。 The /lib
directory has the following /lib
目录有以下内容
$ ls -1 lib/google/cloud
__init__.py
_helpers.py
_helpers.pyc
_http.py
_http.pyc
_testing.py
_testing.pyc
bigquery
bigtable
client.py
client.pyc
datastore
dns
environment_vars.py
environment_vars.pyc
error_reporting
exceptions.py
exceptions.pyc
gapic
iam.py
iam.pyc
language
language_v1
language_v1beta2
logging
monitoring
obselete.py
obselete.pyc
operation.py
operation.pyc
proto
pubsub
resource_manager
runtimeconfig
spanner
speech
speech_v1
storage
translate.py
translate.pyc
translate_v2
videointelligence.py
videointelligence.pyc
videointelligence_v1beta1
vision
vision_v1
Notice that both google.cloud.datastore
and google.cloud.proto
exist in the /lib
folder.请注意,
google.cloud.datastore
和google.cloud.proto
存在于/lib
文件夹中。 However, this import line works fine但是,此导入行工作正常
from google.cloud.proto.datastore.v1 import query_pb2
but this one failed但是这个失败了
from google.cloud import datastore
This is the exception (taken from the google cloud dataflow console online)这是例外(取自在线谷歌云数据流控制台)
(9b49615f4d91c1fb): Traceback (most recent call last):
File "/usr/local/lib/python2.7/dist-packages/dataflow_worker/batchworker.py", line 582, in do_work
work_executor.execute()
File "/usr/local/lib/python2.7/dist-packages/dataflow_worker/executor.py", line 166, in execute
op.start()
File "apache_beam/runners/worker/operations.py", line 294, in apache_beam.runners.worker.operations.DoOperation.start (apache_beam/runners/worker/operations.c:10607)
def start(self):
File "apache_beam/runners/worker/operations.py", line 295, in apache_beam.runners.worker.operations.DoOperation.start (apache_beam/runners/worker/operations.c:10501)
with self.scoped_start_state:
File "apache_beam/runners/worker/operations.py", line 300, in apache_beam.runners.worker.operations.DoOperation.start (apache_beam/runners/worker/operations.c:9702)
pickler.loads(self.spec.serialized_fn))
File "/usr/local/lib/python2.7/dist-packages/apache_beam/internal/pickler.py", line 225, in loads
return dill.loads(s)
File "/usr/local/lib/python2.7/dist-packages/dill/dill.py", line 277, in loads
return load(file)
File "/usr/local/lib/python2.7/dist-packages/dill/dill.py", line 266, in load
obj = pik.load()
File "/usr/lib/python2.7/pickle.py", line 858, in load
dispatch[key](self)
File "/usr/lib/python2.7/pickle.py", line 1133, in load_reduce
value = func(*args)
File "/usr/local/lib/python2.7/dist-packages/dill/dill.py", line 767, in _import_module
return getattr(__import__(module, None, None, [obj]), obj)
File "/usr/local/lib/python2.7/dist-packages/dataflow_pipeline/counters_pipeline.py", line 25, in <module>
from google.cloud import datastore
ImportError: No module named datastore
Why can't it find the package?为什么找不到包?
External dependencies must be installed in setup.py
and this file should be specified in pipeline parameters as --setup_file
.外部依赖项必须安装在
setup.py
并且此文件应在管道参数中指定为--setup_file
。 In the setup.py
you can either install you package by using custom command在
setup.py
您可以使用自定义命令安装包
pip install google-cloud-datastore==1.3.0
or by adding you package into REQUIRED_PACKAGES
:或者通过将您的包添加到
REQUIRED_PACKAGES
:
REQUIRED_PACKAGES = ["google-cloud-datastore==1.3.0"]
The reason why you need to specify it in setup.py
is because libraries you have in appengine_config
are not used during the DataFlow execution.您需要在
setup.py
指定它的原因是因为在 DataFlow 执行期间没有使用您在appengine_config
中的库。 App Engine only acts as a scheduler here, which only deploys job to DataFlow engine. App Engine 在这里仅充当调度程序,仅将作业部署到 DataFlow 引擎。 Then, DataFlow creates some worker machines which execute your pipeline - those workers are not connected by any means to the App Engine.
然后,DataFlow 创建一些工作机器来执行您的管道 - 这些工作人员不会以任何方式连接到 App Engine。 DataFlow workers must have every package required for your pipeline to execute, that's why you need to specify required packages in the
setup.py
file. DataFlow 工作人员必须拥有执行管道所需的每个包,这就是您需要在
setup.py
文件中指定所需包的原因。 DataFlow workers use this file to "setup themselves". DataFlow 工作人员使用此文件来“设置自己”。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.