简体   繁体   English

Google Cloud Dataflow(Python)-无法正确安装依赖项

[英]Google Cloud Dataflow (Python) - Not installing dependencies correctly

I'm trying to run the official Dataflow example here: https://github.com/GoogleCloudPlatform/dataflow-prediction-example 我正在尝试在此处运行官方数据流示例: https : //github.com/GoogleCloudPlatform/dataflow-prediction-example

However, the Dataflow job is not able to start correctly (and the same error is happening with my other jobs too), due to the following type of error in the logs: 但是,由于日志中存在以下类型的错误,Dataflow作业无法正确启动(并且其他作业也发生了相同的错误):

    (happens 2nd) Could not install packages due to an EnvironmentError: 
    [Errno 2] No such file or directory: '/usr/local/lib/python2.7/dist-packages/tensorflow-1.9.0.dist-info/METADATA' 
    (happens 1st) Successfully built tensorflow-module 

I followed the directions on Github exactly, and here is the output of pip freeze of the virtualenv for this example: 我完全按照Github上的指示进行操作,这是此示例的virtualenv pip freeze的输出:

    absl-py==0.4.0
    apache-beam==2.6.0
    astor==0.7.1
    avro==1.8.2
    backports.weakref==1.0.post1
    cachetools==2.1.0
    certifi==2018.8.13
    chardet==3.0.4
    crcmod==1.7
    dill==0.2.8.2
    docopt==0.6.2
    enum34==1.1.6
    fasteners==0.14.1
    funcsigs==1.0.2
    future==0.16.0
    futures==3.2.0
    gapic-google-cloud-pubsub-v1==0.15.4
    gast==0.2.0
    google-apitools==0.5.20
    google-auth==1.5.1
    google-auth-httplib2==0.0.3
    google-cloud-bigquery==0.25.0
    google-cloud-core==0.25.0
    google-cloud-pubsub==0.26.0
    google-gax==0.15.16
    googleapis-common-protos==1.5.3
    googledatastore==7.0.1
    grpc-google-iam-v1==0.11.4
    grpcio==1.14.1
    hdfs==2.1.0
    httplib2==0.11.3
    idna==2.7
    Markdown==2.6.11
    mock==2.0.0
    monotonic==1.5
    numpy==1.14.5
    oauth2client==4.1.2
    pbr==4.2.0
    ply==3.8
    proto-google-cloud-datastore-v1==0.90.4
    proto-google-cloud-pubsub-v1==0.15.4
    protobuf==3.6.1
    pyasn1==0.4.4
    pyasn1-modules==0.2.2
    pydot==1.2.4
    pyparsing==2.2.0
    pytz==2018.4
    PyVCF==0.6.8
    PyYAML==3.13
    requests==2.19.1
    rsa==3.4.2
    six==1.11.0
    tensorboard==1.10.0
    tensorflow==1.10.0
    termcolor==1.1.0
    typing==3.6.4
    urllib3==1.23
    Werkzeug==0.14.1

This pip dependency issue happened for all the other jobs that I tried, so I decided to try the official github example, and it's happening for this one too. 这个pip依赖问题发生在我尝试过的所有其他工作上,所以我决定尝试使用正式的github示例,这也正在发生。

This job id is: 2018-08-15_23_42_57-394561747688459326 , and I'm using Python 2.7 . 该工作ID是: 2018-08-15_23_42_57-394561747688459326 ,我正在使用Python 2.7

Thanks for the help, and any pointers! 感谢您的帮助,以及任何指导!

As explained in the Apache Beam documentation about how to handle Python dependencies in a pipeline , the recommended approach for PyPI dependencies is to create a requirements.txt file and then pass it as an optional command-line option like below (which may have been the mistake when you experimented this issue): 如Apache Beam文档中有关如何处理管道中的Python依赖项中所述 ,对PyPI依赖项的推荐方法是创建一个requirements.txt文件,然后将其作为可选的命令行选项传递,如下所示(可能是实验这个问题时出错):

--requirements_file requirements.txt

In any case, as I can seen in the latest sample on how to run Apache Beam with TensorFlow , what the code does is actually to pass the list of packages to be installed as the install_requires options in the setuptools , so this is also an option that you can follow, and which I see that solved your issue. 无论如何,正如我在有关如何使用TensorFlow运行Apache Beam的最新示例中所看到的那样, 代码实际上是通过将要安装的软件包列表作为setuptoolsinstall_requires选项setuptools ,因此这也是一个选项您可以遵循,而我认为这可以解决您的问题。

I actually got around to solving this issue by removing my requirements.txt file, and posting the very few additional libraries that my app was using in my setup.py file (discarding the dependencies already provided in the Dataflow workers - https://cloud.google.com/dataflow/docs/concepts/sdk-worker-dependencies#version-250_1 ). 我实际上通过删除我的requirements.txt文件并在setup.py文件中发布了我的应用程序正在使用的极少数附加库来解决此问题(丢弃了Dataflow worker中已经提供的依赖项-https :// cloud .google.com / dataflow / docs / concepts / sdk-worker-dependencies#version-250_1 )。

Nevertheless, I'm not exactly sure if this is the right solution, since the Github example itself only worked once I removed the pip install tensorflow command from it's setup.py file. 不过,我不确定这是否是正确的解决方案,因为Github示例本身仅在我从setup.py文件中删除pip install tensorflow命令pip install tensorflow

Hope this helps someone! 希望这对某人有帮助! :) :)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM