[英]How can I install a python package onto Google Dataflow and import it into my pipeline?
My folder structure is as follows: 我的文件夹结构如下:
Project/
--Pipeline.py
--setup.py
--dist/
--ResumeParserDependencies-0.1.tar.gz
--Dependencies/
--Module1.py
--Module2.py
--Module3.py
My setup.py
file looks like this: 我的
setup.py
文件如下所示:
from setuptools import setup, find_packages
setup(name='ResumeParserDependencies',
version='0.1',
description='Dependencies',
install_requires=[
'google-cloud-storage==1.11.0',
'requests==2.19.1',
'urllib3==1.23'
],
packages = ['Dependencies']
)
I used the setup.py file to create a tar.gz file using 'python setup.py sdist'. 我使用setup.py文件使用“ python setup.py sdist”创建了tar.gz文件。 The tar file is in the dist folder as ResumeParserDependencies-0.1.tar.gz.
tar文件位于dist文件夹中,名为ResumeParserDependencies-0.1.tar.gz。 I then specified
然后我指定
setup_options.extra_packages = ['./dist/ResumeParserDependencies-0.1.tar.gz'] in my pipeline options.
However, once I run my pipeline on Dataflow, I get the error 'No module named ResumeParserDependencies'. 但是,一旦在Dataflow上运行管道,就会出现错误“没有名为ResumeParserDependencies的模块”。 If I use 'pip install ResumeParserDependencies-0.1.tar.gz' locally, the package installs, and I can see it using 'pip freeze'.
如果我在本地使用“ pip install ResumeParserDependencies-0.1.tar.gz”,则该软件包会安装,并且可以使用“ pip Frozen”看到它。
What am I missing to load the package into Dataflow? 将包加载到Dataflow中我缺少什么?
I changed my folder structure and got this to work: 我更改了文件夹结构,并使其正常工作:
Project/
--Pipeline.py
--setup.py
--Module1/
--__init__.py
--Module2/
--__init__.py
--Module3/
--__init__.py
The setup.py file now looks like this: from setuptools import setup, find_packages setup.py文件现在看起来像这样:从setuptools导入安装程序,find_packages
setup(name='ResumeParserDependencies',
version='0.1',
description='Dependencies',
install_requires=[
'google-cloud-storage==1.11.0',
'urllib3==1.23'
],
packages = find_packages()
)
In my pipeline, I specified: 在管道中,我指定了:
setup_options.setup_file = './setup.py'
And I didn't need: 而且我不需要:
setup_options.extra_packages = ['./dist/ResumeParserDependencies-0.1.tar.gz']
Reference: find_packages doesn't find my Python file 参考: find_packages找不到我的Python文件
Usually when this issue happens is from a version mismatch of either the SDK or the Worker Dependencies. 通常,发生此问题的原因是SDK或Worker Dependencies的版本不匹配。 To solve your issue, check your Dataflow version and the Worker Dependencies for the SDK version to verify if you're running compatible versions.
要解决您的问题, 请检查您的Dataflow版本和SDK版本的Worker依赖关系,以验证您是否在运行兼容版本。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.