[英]ModuleNotFoundError while running apache beam job in dataflow using setup.py
I'm having an apache beam pipeline that I used to submit to google dataflow and it runs successfully.我有一个 apache 光束管道,我曾经提交给谷歌数据流,它运行成功。 By time, my code keep growing and I want to structure it into multiple file dependencies.
随着时间的推移,我的代码不断增长,我想将其构建为多个文件依赖项。 That's why I referred to apache beam section Multiple File Dependencies
这就是为什么我提到了 apache beam 部分Multiple File Dependencies
When I structured my code as follows:当我按如下方式构建代码时:
root_dir/
setup.py
main.py
__init__.py
extract/
__init__.py
extract.py
When I execute it in local, it runs good, when I submit it to dataflow, I receive the following error:当我在本地执行它时,它运行良好,当我将它提交到数据流时,我收到以下错误:
ModuleNotFoundError: No module named 'extract'
My setup.py looks like this:我的 setup.py 看起来像这样:
from setuptools import setup, find_packages
setup(
name="g_dataflow",
version="0.1.0",
install_requires=[
'google-cloud-storage==1.42.0'
],
packages=find_packages()
)
I tried to follow the Juliaset example by apache beam, but with no success.我试图按照 apache beam 的 Juliaset 示例进行操作,但没有成功。 Has anyone faced the same issue before?
有没有人遇到过同样的问题?
我不得不将--save_main_session
添加到我的命令行,这解决了这个问题
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.