简体   繁体   English

在Docker容器内提交培训时出现Gcloud内部错误

[英]Gcloud Internal Error while submitting training inside Docker container

I'm building a Docker container to submit ML training jobs using gcloud - the runnable is actually a Python program and gcloud is being executed via subprocess.check_output . 我建立一个泊坞窗容器提交使用gcloud ML培训工作-可运行的实际上是一个Python程序,并正在通过subprocess.check_output执行gcloud。 Running the program outside a Docker container works just fine which makes me wonder if there is some dependency that is not installed but gcloud simply outputs no useful logs at all. 在Docker容器外运行程序可以很好地工作,这使我想知道是否存在一些未安装的依赖项,但是gcloud根本不输出任何有用的日志。

While running gcloud ml-engine jobs submit training the executable returns exit status 1 simply outputting Internal Error . 在运行gcloud ml-engine作业提交培训时 ,可执行文件仅输出Internal Error即可返回退出状态1 The logs that are available on Google Cloud Console are always 5 entries of "Validating job requirements..." with no further information. Google Cloud Console上可用的日志始终是“正在验证作业要求...”的 5个条目,没有更多信息。

The Docker container has the following installed dependencies (some are not relevant to Google Cloud ML but are used by other features in the program): Docker容器具有以下已安装的依赖项(有些与Google Cloud ML不相关,但由程序中的其他功能使用):

Via apt-get: python, python-pip, python-dev, libmysqlclient-dev, curl 通过apt-get: python,python-pip,python-dev,libmysqlclient-dev,curl

Via pip install: flask, MySQL-python, configparser, pandas, tensorflow 通过pip安装: flask,MySQL-python,configparser,pandas,tensorflow

The gcloud tool itself is installed by downloading the SDK and installing it through command line: 通过下载SDK并通过命令行进行安装,即可安装gcloud工具本身:

RUN curl https://dl.google.com/dl/cloudsdk/release/google-cloud-sdk.tar.gz > /tmp/google-cloud-sdk.tar.gz
RUN mkdir -p /usr/local/gcloud
RUN tar -C /usr/local/gcloud -xvf /tmp/google-cloud-sdk.tar.gz
RUN /usr/local/gcloud/google-cloud-sdk/install.sh
ENV PATH $PATH:/usr/local/gcloud/google-cloud-sdk/bin

Account credentials are setup via 帐户凭证是通过设置的

RUN gcloud auth activate-service-account --key-file path-to-keyfile-in-docker-container
RUN gsutil version -l

Last gsutil version command is pretty much just to make sure SDK installation is working. 最后一个gsutil version命令几乎只是为了确保SDK安装正常。

Does anyone have any clue what might be happening or how to further debug what might me causing an Internal Error on gcloud? 有谁知道可能会发生什么,或者如何进一步调试可能导致gcloud 内部错误的原因?

Thanks in advance! 提前致谢! :) :)

Please make sure all the parameters are set properly and make sure you have all your dependencies uploaded and packaged properly . 请确保正确设置所有参数 ,并确保正确上传和打包了所有依赖项

If everything is done and you still can't run the job, you will need more than just "Internal Error" to pinpoint the issue. 如果一切都完成了,但您仍然无法运行该作业,则不仅需要“内部错误”来查明问题。 Please either contact Google Cloud Platform support or file a bug on the Public Issue Tracker to get further assistance. 请联系Google Cloud Platform支持或在Public Issue Tracker上提交错误以获取更多帮助。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM