简体   繁体   English

如何将python软件包安装到Google Dataflow上并将其导入管道中?

[英]How can I install a python package onto Google Dataflow and import it into my pipeline?

My folder structure is as follows: 我的文件夹结构如下:

Project/
 --Pipeline.py
 --setup.py
 --dist/
  --ResumeParserDependencies-0.1.tar.gz
 --Dependencies/
        --Module1.py
        --Module2.py
        --Module3.py

My setup.py file looks like this: 我的setup.py文件如下所示:

from setuptools import setup, find_packages

setup(name='ResumeParserDependencies',
  version='0.1',
  description='Dependencies',
  install_requires=[
   'google-cloud-storage==1.11.0',
   'requests==2.19.1',
   'urllib3==1.23'
    ],
  packages = ['Dependencies']
 )

I used the setup.py file to create a tar.gz file using 'python setup.py sdist'. 我使用setup.py文件使用“ python setup.py sdist”创建了tar.gz文件。 The tar file is in the dist folder as ResumeParserDependencies-0.1.tar.gz. tar文件位于dist文件夹中,名为ResumeParserDependencies-0.1.tar.gz。 I then specified 然后我指定

setup_options.extra_packages = ['./dist/ResumeParserDependencies-0.1.tar.gz'] in my pipeline options.

However, once I run my pipeline on Dataflow, I get the error 'No module named ResumeParserDependencies'. 但是,一旦在Dataflow上运行管道,就会出现错误“没有名为ResumeParserDependencies的模块”。 If I use 'pip install ResumeParserDependencies-0.1.tar.gz' locally, the package installs, and I can see it using 'pip freeze'. 如果我在本地使用“ pip install ResumeParserDependencies-0.1.tar.gz”,则该软件包会安装,并且可以使用“ pip Frozen”看到它。


What am I missing to load the package into Dataflow? 将包加载到Dataflow中我缺少什么?

I changed my folder structure and got this to work: 我更改了文件夹结构,并使其正常工作:

Project/
--Pipeline.py
--setup.py
--Module1/
    --__init__.py
--Module2/
    --__init__.py
--Module3/
    --__init__.py

The setup.py file now looks like this: from setuptools import setup, find_packages setup.py文件现在看起来像这样:从setuptools导入安装程序,find_packages

setup(name='ResumeParserDependencies',
  version='0.1',
  description='Dependencies',
  install_requires=[
   'google-cloud-storage==1.11.0',
   'urllib3==1.23'
    ],
  packages = find_packages()
 )

In my pipeline, I specified: 在管道中,我指定了:

setup_options.setup_file = './setup.py'

And I didn't need: 而且我不需要:

setup_options.extra_packages = ['./dist/ResumeParserDependencies-0.1.tar.gz']

Reference: find_packages doesn't find my Python file 参考: find_packages找不到我的Python文件

Usually when this issue happens is from a version mismatch of either the SDK or the Worker Dependencies. 通常,发生此问题的原因是SDK或Worker Dependencies的版本不匹配。 To solve your issue, check your Dataflow version and the Worker Dependencies for the SDK version to verify if you're running compatible versions. 要解决您的问题, 请检查您的Dataflow版本SDK版本Worker依赖关系,以验证您是否在运行兼容版本。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何在python中导入我自己的包? - How can I import my own package in python? 如何在mac上的python中导入非标准包? 此软件包不支持 pip 安装 - How can I import a non standard package in python on mac? This package does not support pip install 我如何在 python 中安装 ipsec package - how can i install ipsec package in python 如何在 Google Colab 的 Jupyter 笔记本中导入自定义 Python package 和模块? - How can I import a custom Python package and module in a Jupyter notebook on Google Colab? 如何从私有 Pypi 存储库将 python package 安装到 google Colab? - How can I install a python package to google Colab from a private Pypi repository? 如何在Windows的python本地副本上安装我的设备? - how do I install my modual onto my local copy of python on windows? 如何分析 Python Dataflow 作业? - How can I profile a Python Dataflow job? 如何为数据流安装 python 依赖项 - How to install python dependencies for dataflow 当我的计算机上有 Python 2、Python 3 和 Anaconda 时,如何控制将包安装到哪个 Python 发行版? - How can I control which Python distribution to pip install a package to when I have Python 2, Python 3, and Anaconda on my computer? Python:如何将python模块安装到某个文件夹中并将其导入 - Python: How can I install a python module into a certain folder and import it
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM