[英]Libraries cannot be found on Dataflow/Apache-beam job launched from CircleCI
I am having serious issues running a python Apache Beam pipeline using a GCP Dataflow runner, launched from CircleCI.我在使用从 CircleCI 启动的 GCP Dataflow 运行器运行 python Apache Beam 管道时遇到了严重问题。 I would really appreciate if someone could give any hint on how to tackle this, I've tried it all but nothing seems to work.如果有人可以就如何解决这个问题提供任何提示,我将不胜感激,我已经尝试了所有方法,但似乎没有任何效果。
Basically, I'm running this python Apache Beam pipeline which runs in Dataflow and uses google-api-python-client-1.12.3
.基本上,我正在运行这个 python Apache Beam 管道,它在 Dataflow 中运行并使用google-api-python-client-1.12.3
。 If I run the job in my machine ( python3 main.py --runner dataflow --setup_file /path/to/my/file/setup.py
), it works fine.如果我在我的机器上运行作业( python3 main.py --runner dataflow --setup_file /path/to/my/file/setup.py
),它工作正常。 If I run this same job from within CircleCI, the Dataflow job is created, but it fails with a message ImportError: No module named 'apiclient'
.如果我从 CircleCI 中运行相同的作业,则会创建 Dataflow 作业,但它会失败并显示消息ImportError: No module named 'apiclient'
。
By looking at this documentation , I think I should probably use explicitely a requirements.txt
file.通过查看这个文档,我想我应该明确地使用一个requirements.txt
文件。 If I run that same pipeline from CircleCI, but adding the --requirements_file
argument to a requirements file containing a single line ( google-api-python-client==1.12.3
), the dataflow job fails because the workers fail too.如果我从 CircleCI 运行相同的管道,但将--requirements_file
参数添加到包含单行( google-api-python-client==1.12.3
)的需求文件中,数据流作业将失败,因为工作人员也失败了。 In the logs, there's a info message first ERROR: Could not find a version that satisfies the requirement wheel (from versions: none)"
which results in a later error message "Error syncing pod somePodIdHere (\\"dataflow-myjob-harness-rl84_default(somePodIdHere)\\"), skipping: failed to \\"StartContainer\\" for \\"python\\" with CrashLoopBackOff: \\"back-off 40s restarting failed container=python pod=dataflow-myjob-harness-rl84_default(somePodIdHere)\\"
.在日志中,首先有一条信息消息ERROR: Could not find a version that satisfies the requirement wheel (from versions: none)"
,这导致稍后的错误消息"Error syncing pod somePodIdHere (\\"dataflow-myjob-harness-rl84_default(somePodIdHere)\\"), skipping: failed to \\"StartContainer\\" for \\"python\\" with CrashLoopBackOff: \\"back-off 40s restarting failed container=python pod=dataflow-myjob-harness-rl84_default(somePodIdHere)\\"
. I found this thread but the solution didn't seem to work in my case.我找到了这个线程,但该解决方案在我的情况下似乎不起作用。
Any help would be really, really appreciated.任何帮助都会非常非常感激。 Thanks a lot in advance!非常感谢!
This question looks very similar to yours.这个问题看起来和你的非常相似。 The solution seemed to be to explicitly include the dependencies of your requirements in your requirements.txt
解决方案似乎是在您的requirements.txt
明确包含您的需求的依赖项
apache beam 2.19.0 not running on cloud dataflow anymore due to Could not find a version that satisfies the requirement setuptools>=40.8 由于找不到满足要求的版本,apache beam 2.19.0 不再在云数据流上运行 setuptools>=40.8
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.