简体   繁体   English

在从 CircleCI 启动的 Dataflow/Apache-beam 作业中找不到库

[英]Libraries cannot be found on Dataflow/Apache-beam job launched from CircleCI

I am having serious issues running a python Apache Beam pipeline using a GCP Dataflow runner, launched from CircleCI.我在使用从 CircleCI 启动的 GCP Dataflow 运行器运行 python Apache Beam 管道时遇到了严重问题。 I would really appreciate if someone could give any hint on how to tackle this, I've tried it all but nothing seems to work.如果有人可以就如何解决这个问题提供任何提示,我将不胜感激,我已经尝试了所有方法,但似乎没有任何效果。

Basically, I'm running this python Apache Beam pipeline which runs in Dataflow and uses google-api-python-client-1.12.3 .基本上,我正在运行这个 python Apache Beam 管道,它在 Dataflow 中运行并使用google-api-python-client-1.12.3 If I run the job in my machine ( python3 main.py --runner dataflow --setup_file /path/to/my/file/setup.py ), it works fine.如果我在我的机器上运行作业( python3 main.py --runner dataflow --setup_file /path/to/my/file/setup.py ),它工作正常。 If I run this same job from within CircleCI, the Dataflow job is created, but it fails with a message ImportError: No module named 'apiclient' .如果我从 CircleCI 中运行相同的作业,则会创建 Dataflow 作业,但它会失败并显示消息ImportError: No module named 'apiclient'

By looking at this documentation , I think I should probably use explicitely a requirements.txt file.通过查看这个文档,我想我应该明确地使用一个requirements.txt文件。 If I run that same pipeline from CircleCI, but adding the --requirements_file argument to a requirements file containing a single line ( google-api-python-client==1.12.3 ), the dataflow job fails because the workers fail too.如果我从 CircleCI 运行相同的管道,但将--requirements_file参数添加到包含单行( google-api-python-client==1.12.3 )的需求文件中,数据流作业将失败,因为工作人员也失败了。 In the logs, there's a info message first ERROR: Could not find a version that satisfies the requirement wheel (from versions: none)" which results in a later error message "Error syncing pod somePodIdHere (\\"dataflow-myjob-harness-rl84_default(somePodIdHere)\\"), skipping: failed to \\"StartContainer\\" for \\"python\\" with CrashLoopBackOff: \\"back-off 40s restarting failed container=python pod=dataflow-myjob-harness-rl84_default(somePodIdHere)\\" .在日志中,首先有一条信息消息ERROR: Could not find a version that satisfies the requirement wheel (from versions: none)" ,这导致稍后的错误消息"Error syncing pod somePodIdHere (\\"dataflow-myjob-harness-rl84_default(somePodIdHere)\\"), skipping: failed to \\"StartContainer\\" for \\"python\\" with CrashLoopBackOff: \\"back-off 40s restarting failed container=python pod=dataflow-myjob-harness-rl84_default(somePodIdHere)\\" . I found this thread but the solution didn't seem to work in my case.我找到了这个线程,但该解决方案在我的情况下似乎不起作用。

Any help would be really, really appreciated.任何帮助都会非常非常感激。 Thanks a lot in advance!非常感谢!

This question looks very similar to yours.这个问题看起来和你的非常相似。 The solution seemed to be to explicitly include the dependencies of your requirements in your requirements.txt解决方案似乎是在您的requirements.txt明确包含您的需求的依赖项

apache beam 2.19.0 not running on cloud dataflow anymore due to Could not find a version that satisfies the requirement setuptools>=40.8 由于找不到满足要求的版本,apache beam 2.19.0 不再在云数据流上运行 setuptools>=40.8

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM