[英]Why did I encounter an "Error syncing pod" with Dataflow pipeline?
I experiment a weird error with my Dataflow pipeline when I want to use specific library from PyPI.当我想使用 PyPI 中的特定库时,我用我的 Dataflow 管道试验了一个奇怪的错误。
I need jsonschema
in a ParDo, so, in my requirements.txt
file, I added jsonschema==3.2.0
.我需要jsonschema
中的 jsonschema,因此,在我的requirements.txt
文件中,我添加了jsonschema==3.2.0
。 I launch my pipeline with the command line below:我使用下面的命令行启动我的管道:
python -m gcs_to_all \
--runner DataflowRunner \
--project <my-project-id> \
--region europe-west1 \
--temp_location gs://<my-bucket-name>/temp/ \
--input_topic "projects/<my-project-id>/topics/<my-topic>" \
--network=<my-network> \
--subnetwork=<my-subnet> \
--requirements_file=requirements.txt \
--experiments=allow_non_updatable_job \
--streaming
In the terminal, all seems to be good:在终端中,一切似乎都很好:
INFO:root:2020-01-03T09:18:35.569Z: JOB_MESSAGE_BASIC: Worker configuration: n1-standard-4 in europe-west1-b.
INFO:root:2020-01-03T09:18:35.806Z: JOB_MESSAGE_WARNING: The network default doesn't have rules that open TCP ports 12345-12346 for internal connection with other VMs. Only rules with a target tag 'dataflow' or empty target tags set apply. If you don't specify such a rule, any pipeline with more than one worker that shuffles data will hang. Causes: Firewall rules associated with your network don't open TCP ports 12345-12346 for Dataflow instances. If a firewall rule opens connection in these ports, ensure target tags aren't specified, or that the rule includes the tag 'dataflow'.
INFO:root:2020-01-03T09:18:48.549Z: JOB_MESSAGE_DETAILED: Workers have started successfully.
Where's no error in the log tab on Dataflow webpage, but in stackdriver: Dataflow网页上的日志选项卡中没有错误,但在stackdriver中:
message: "Error syncing pod 6515c378c6bed37a2c0eec1fcfea300c ("<dataflow-id>--01030117-c9pc-harness-5lkv_default(6515c378c6bed37a2c0eec1fcfea300c)"), skipping: [failed to "StartContainer" for "sdk0" with CrashLoopBackOff: "Back-off 10s restarting failed container=sdk0 pod=<dataflow-id>--01030117-c9pc-harness-5lkv_default(6515c378c6bed37a2c0eec1fcfea300c)""
message: ", failed to "StartContainer" for "sdk1" with CrashLoopBackOff: "Back-off 5m0s restarting failed container=sdk1 pod=<dataflow-id>--01030117-c9pc-harness-5lkv_default(6515c378c6bed37a2c0eec1fcfea300c)""
message: ", failed to "StartContainer" for "sdk2" with CrashLoopBackOff: "Back-off 5m0s restarting failed container=sdk2 pod=<dataflow-id>--01030117-c9pc-harness-5lkv_default(6515c378c6bed37a2c0eec1fcfea300c)""
message: ", failed to "StartContainer" for "sdk3" with CrashLoopBackOff: "Back-off 5m0s restarting failed container=sdk3 pod=<dataflow-id>--01030117-c9pc-harness-5lkv_default(6515c378c6bed37a2c0eec1fcfea300c)""
I find this error too (in info mode):我也发现了这个错误(在信息模式下):
Collecting jsonschema (from -r /var/opt/google/staged/requirements.txt (line 1))
Installing build dependencies: started
Looking in links: /var/opt/google/staged
Installing build dependencies: started
Collecting jsonschema (from -r /var/opt/google/staged/requirements.txt (line 1))
Installing build dependencies: started
Looking in links: /var/opt/google/staged
Collecting jsonschema (from -r /var/opt/google/staged/requirements.txt (line 1))
Installing build dependencies: started
Installing build dependencies: finished with status 'error'
ERROR: Command errored out with exit status 1:
command: /usr/local/bin/python3 /usr/local/lib/python3.7/site-packages/pip install --ignore-installed --no-user --prefix /tmp/pip-build-env-mdurhav9/overlay --no-warn-script-location --no-binary :none: --only-binary :none: --no-index --find-links /var/opt/google/staged -- 'setuptools>=40.6.0' wheel
cwd: None
Complete output (5 lines):
Looking in links: /var/opt/google/staged
Collecting setuptools>=40.6.0
Collecting wheel
ERROR: Could not find a version that satisfies the requirement wheel (from versions: none)
ERROR: No matching distribution found for wheel
But I don't know why it can get this dependency...但我不知道为什么它可以得到这种依赖......
Do you have any idea how I can debug this?你知道我该如何调试吗? or why I encounter this error?或者为什么我会遇到这个错误?
Thanks谢谢
When Dataflow workers start, they execute several steps:当 Dataflow Worker 启动时,它们会执行几个步骤:
requirements.txt
从requirements.txt
安装包extra_packages
安装指定为extra_packages
软件包setup.py
.安装工作流 tarball 并执行setup.py
提供的操作。 Error syncing pod
with CrashLoopBackOff
message can be related to dependency conflict.使用CrashLoopBackOff
消息Error syncing pod
时CrashLoopBackOff
可能与依赖冲突有关。 You need to verify that there are no conflicts with the libraries and versions used for the job.您需要验证与用于作业的库和版本没有冲突。 Please refer to the documentation for staging required dependencies of the pipeline.请参阅有关暂存管道所需依赖项的文档。
Also, take a look for preinstalled dependencies and this StackOverflow thread .另外,查看预安装的依赖项和这个StackOverflow 线程。
What you can try is change the version of jsonschema
and try run it again.您可以尝试更改jsonschema
的版本并尝试再次运行它。 If it wouldn't help, please provide requirements.txt
file.如果没有帮助,请提供requirements.txt
文件。
I hope it will help you.我希望它会帮助你。
There is a playbook for this error: https://cloud.google.com/dataflow/docs/guides/common-errors#error-syncing-pod这个错误有一个剧本: https://cloud.google.com/dataflow/docs/guides/common-errors#error-syncing-pod
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.