简体   繁体   English

Google Cloud Dataflow 无法导入“google.cloud.datastore”

[英]Google Cloud Dataflow can't import 'google.cloud.datastore'

This is my import code这是我的导入代码

from __future__ import absolute_import

import datetime
import json
import logging
import re

import apache_beam as beam
from apache_beam import combiners
from apache_beam.io.gcp.bigquery import parse_table_schema_from_json
from apache_beam.io.gcp.datastore.v1.datastoreio import ReadFromDatastore
from apache_beam.pvalue import AsDict
from apache_beam.pvalue import AsSingleton
from apache_beam.options.pipeline_options import PipelineOptions

from google.cloud.proto.datastore.v1 import query_pb2
from google.cloud import datastore
from googledatastore import helper as datastore_helper, PropertyFilter

# datastore entities that we need to perform the mapping computations
#from models import UserPlan, UploadIntervalCount, RollingMonthlyCount

This is what my requirements.txt file looks like这就是我的 requirements.txt 文件的样子

$ cat requirements.txt
Flask==0.12.2
apache-beam[gcp]==2.1.1
gunicorn==19.7.1
google-cloud-dataflow==2.1.1
six==1.10.0
google-cloud-datastore==1.3.0
google-cloud

This is all in the /lib directory.这一切都在/lib目录中。 The /lib directory has the following /lib目录有以下内容

$ ls -1 lib/google/cloud
__init__.py
_helpers.py
_helpers.pyc
_http.py
_http.pyc
_testing.py
_testing.pyc
bigquery
bigtable
client.py
client.pyc
datastore
dns
environment_vars.py
environment_vars.pyc
error_reporting
exceptions.py
exceptions.pyc
gapic
iam.py
iam.pyc
language
language_v1
language_v1beta2
logging
monitoring
obselete.py
obselete.pyc
operation.py
operation.pyc
proto
pubsub
resource_manager
runtimeconfig
spanner
speech
speech_v1
storage
translate.py
translate.pyc
translate_v2
videointelligence.py
videointelligence.pyc
videointelligence_v1beta1
vision
vision_v1

Notice that both google.cloud.datastore and google.cloud.proto exist in the /lib folder.请注意, google.cloud.datastoregoogle.cloud.proto存在于/lib文件夹中。 However, this import line works fine但是,此导入行工作正常

from google.cloud.proto.datastore.v1 import query_pb2

but this one failed但是这个失败了

from google.cloud import datastore

This is the exception (taken from the google cloud dataflow console online)这是例外(取自在线谷歌云数据流控制台)

(9b49615f4d91c1fb): Traceback (most recent call last):
  File "/usr/local/lib/python2.7/dist-packages/dataflow_worker/batchworker.py", line 582, in do_work
    work_executor.execute()
  File "/usr/local/lib/python2.7/dist-packages/dataflow_worker/executor.py", line 166, in execute
    op.start()
  File "apache_beam/runners/worker/operations.py", line 294, in apache_beam.runners.worker.operations.DoOperation.start (apache_beam/runners/worker/operations.c:10607)
    def start(self):
  File "apache_beam/runners/worker/operations.py", line 295, in apache_beam.runners.worker.operations.DoOperation.start (apache_beam/runners/worker/operations.c:10501)
    with self.scoped_start_state:
  File "apache_beam/runners/worker/operations.py", line 300, in apache_beam.runners.worker.operations.DoOperation.start (apache_beam/runners/worker/operations.c:9702)
    pickler.loads(self.spec.serialized_fn))
  File "/usr/local/lib/python2.7/dist-packages/apache_beam/internal/pickler.py", line 225, in loads
    return dill.loads(s)
  File "/usr/local/lib/python2.7/dist-packages/dill/dill.py", line 277, in loads
    return load(file)
  File "/usr/local/lib/python2.7/dist-packages/dill/dill.py", line 266, in load
    obj = pik.load()
  File "/usr/lib/python2.7/pickle.py", line 858, in load
    dispatch[key](self)
  File "/usr/lib/python2.7/pickle.py", line 1133, in load_reduce
    value = func(*args)
  File "/usr/local/lib/python2.7/dist-packages/dill/dill.py", line 767, in _import_module
    return getattr(__import__(module, None, None, [obj]), obj)
  File "/usr/local/lib/python2.7/dist-packages/dataflow_pipeline/counters_pipeline.py", line 25, in <module>
    from google.cloud import datastore
ImportError: No module named datastore

Why can't it find the package?为什么找不到包?

External dependencies must be installed in setup.py and this file should be specified in pipeline parameters as --setup_file .外部依赖项必须安装在setup.py并且此文件应在管道参数中指定为--setup_file In the setup.py you can either install you package by using custom commandsetup.py您可以使用自定义命令安装包

pip install google-cloud-datastore==1.3.0

or by adding you package into REQUIRED_PACKAGES :或者通过将您的包添加到REQUIRED_PACKAGES

REQUIRED_PACKAGES = ["google-cloud-datastore==1.3.0"]

The reason why you need to specify it in setup.py is because libraries you have in appengine_config are not used during the DataFlow execution.您需要在setup.py指定它的原因是因为在 DataFlow 执行期间没有使用您在appengine_config中的库。 App Engine only acts as a scheduler here, which only deploys job to DataFlow engine. App Engine 在这里仅充当调度程序,仅将作业部署到 DataFlow 引擎。 Then, DataFlow creates some worker machines which execute your pipeline - those workers are not connected by any means to the App Engine.然后,DataFlow 创建一些工作机器来执行您的管道 - 这些工作人员不会以任何方式连接到 App Engine。 DataFlow workers must have every package required for your pipeline to execute, that's why you need to specify required packages in the setup.py file. DataFlow 工作人员必须拥有执行管道所需的每个包,这就是您需要在setup.py文件中指定所需包的原因。 DataFlow workers use this file to "setup themselves". DataFlow 工作人员使用此文件来“设置自己”。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM