簡體   English   中英

從 TF-YARN 庫創建 pex 以進行分布式訓練時出錯

[英]Getting error in creating pex from TF-YARN library for distributed training

由於我們的數據位於 Hadoop 中,因此我們正在嘗試使用 TF-YARN 庫在 Tenorflow 上訓練 DL。 但是我們在 cluster_pack.upload_env() 中遇到錯誤

以下是完整的錯誤:

錯誤:cluster_pack.packaging:無法創建 pex Traceback(最近一次調用最后):文件“/data1/python3.6.10/lib/python3.6/site-packages/cluster_pack/packaging.py”,第 144 行,在 pack_in_pex 索引= [CRITEO_PYPI_URL] if _is_criteo() else None) File "/data1/python3.6.10/lib/python3.6/site-packages/pex/resolver.py", line 803, in resolve_multi return list(resolve_request.resolve_distributions(ignore_errors=忽略錯誤))文件“/data1/python3.6.10/lib/python3.6/site-packages/pex/resolver.py”,第 500 行,在 resolve_distributions raise_type=Unsatisfiable):文件“/data1/python3.6.10/lib/ python3.6/site-packages/pex/resolver.py”,第 370 行,_run_parallel max_jobs=self._max_parallel_jobs 文件“/data1/python3.6.10/lib/python3.6/site-packages/pex/jobs.py” ,第 219 行,在 execute_parallel 中引發錯誤 pex.resolver.Unsatisfiable: pid: 6749 -> /data1/python3.6.10/bin/python3.6 /tmp/tmpirzknr9r --disable-pip-version-check --isolated --exists -action i -q --no-cache-dir 下載 --dest /tmp/tmp1ezcnpuj/res olved_dists/cp36-cp36m absl-py==0.9.0 alembic==1.4.2 astor==0.8.1 astunparse==1.6.3 async-generator==1.10 attrs==19.3.0 backcall==0.1.0 漂白==3.1.5 cachetools==4.1.1 certifi==2020.4.5.1 certipy==0.1.3 cffi==1.14.0 chardet==3.0.4 cloudpickle==1.3.0 cluster-pack==0.0.9 conda -pack==0.4.0 cryptography==2.9.2 cx-Oracle==7.3.0 cycler==0.10.0 decorator==4.4.2 defusedxml==0.6.0 entrypoints==0.3 gast==0.3.3 google -auth==1.18.0 google-auth-oauthlib==0.4.1 google-pasta==0.2.0 graphframes==0.6 grpcio==1.30.0 h5py==2.10.0 icc-rt==2020.0.133 idna ==2.9 importlib-metadata==1.6.0 intel-openmp==2020.0.133 ipykernel==5.3.0 ipython==7.14.0 ipython-genutils==0.2.0 ipywidgets==7.5.1 jedi==0.17。 0 Jinja2==2.11.2 joblib==0.16.0 json5==0.9.4 jsonschema==3.2.0 jupyter-client==6.1.3 jupyter-core==4.6.3 jupyter-telemetry==0.1.0 jupyter -tensorboard==0.2.0 jupyterhub==1.1.0 jupyterlab==2.1.2 jupyterlab-server==1.1.4 Keras==2.4.3 Keras-Applications==1.0.8 Keras-Preprocessing==1.1.2 kiwisolver ==1.2.0 真子==1.1.2 三月 kdown==3.2.2 MarkupSafe==1.1.1 matplotlib==3.2.2 misune==0.8.4 mkl==2019.0 mkl-random==1.0.1.1 nbconvert==5.6.1 nbformat==5.0.6 networkx= =2.4 鼻子==1.3.7 筆記本==6.0.3 numpy==1.18.5 oauthlib==3.1.0 opt-einsum==3.2.1 包裝==20.4 帕梅拉==1.0.0 熊貓==1.0.4 pandocfilters==1.4.2 parso==0.7.0 pex==2.1.1 pexpect==4.8.0 pickleshare==0.7.5 prometheus-client==0.7.1 prompt-toolkit==3.0.5 protobuf==3.12 .2 ptyprocess==0.6.0 py4j==0.10.7 pyarrow==1.0.0 pyasn1==0.4.8 pyasn1-modules==0.2.8 pycparser==2.20 Pygments==2.6.1 pyOpenSSL==19.1.0 pyparsing==2.4.7 pyrsistent==0.16.0 pyspark==2.4.6 python-dateutil==2.8.1 python-editor==1.0.4 python-json-logger==0.1.11 pytz==2020.1 PyYAML= =5.3.1 pyzmq==19.0.1 requests==2.23.0 requests-oauthlib==1.3.0 rsa==4.6 ruamel.yaml==0.16.10 ruamel.yaml.clib==0.2.0 scikit-learn= =0.23.1 scipy==1.4.1 seaborn==0.10.1 Send2Trash==1.5.0 六==1.15.0 skein==0.8.0 sklearn==0.0 SQLAlchemy==1.3.17 tbb==2019.0 tbb4py= =2019.0張量板==2.2.2 nsorboard-plugin-wit==1.7.0 tensorflow==2.2.0 tensorflow-estimator==2.2.0 tensorflowonspark==2.2.1 termcolor==1.1.0 terminado==0.8.3 testpath==0.4.4 tf- yarn==0.5.1 threadpoolctl==2.1.0 tornado==6.0.4 traitlets==4.3.3 urllib3==1.25.9 wcwidth==0.1.9 webencodings==0.5.1 Werkzeug==1.0.1 widgetsnbextension= =3.5.1 wrapt==1.12.1 zipp==3.1.0 提出執行 /data1/python3.6.10/bin/python3.6 /tmp/tmpirzknr9r --disable-pip-version-check --isolated --exists-動作 i -q --no-cache-dir 下載 --dest /tmp/tmp1ezcnpuj/resolved_dists/cp36-cp36m absl-py==0.9.0 alembic==1.4.2 astor==0.8.1 astunparse==1.6。 3 async-generator==1.10 attrs==19.3.0 backcall==0.1.0 漂白==3.1.5 cachetools==4.1.1 certifi==2020.4.5.1 certipy==0.1.3 cffi==1.14.0 chardet ==3.0.4 cloudpickle==1.3.0 cluster-pack==0.0.9 conda-pack==0.4.0 cryptography==2.9.2 cx-Oracle==7.3.0 cycler==0.10.0 decorator== 4.4.2 defusedxml==0.6.0 入口點==0.3 gast==0.3.3 google-auth==1.18.0 google-auth-oauthlib==0.4.1 google-pasta==0.2.0 graphframes== 0.6 grpcio==1.30.0 h5py==2.10.0 icc-rt==2020.0.133 idna==2.9 importlib-metadata==1.6.0 intel-openmp==2020.0.133 ipykernel==5.3.0 ipython== 7.14.0 ipython-genutils==0.2.0 ipywidgets==7.5.1 jedi==0.17.0 Jinja2==2.11.2 joblib==0.16.0 json5==0.9.4 jsonschema==3.2.0 jupyter-client ==6.1.3 jupyter-core==4.6.3 jupyter-telemetry==0.1.0 jupyter-tensorboard==0.2.0 jupyterhub==1.1.0 jupyterlab==2.1.2 jupyterlab-server==1.1.4 Keras ==2.4.3 Keras-Applications==1.0.8 Keras-Preprocessing==1.1.2 kiwisolver==1.2.0 Mako==1.1.2 Markdown==3.2.2 MarkupSafe==1.1.1 matplotlib==3.2。 2 misune==0.8.4 mkl==2019.0 mkl-random==1.0.1.1 nbconvert==5.6.1 nbformat==5.0.6 networkx==2.4 nose==1.3.7 notebook==6.0.3 numpy== 1.18.5 oauthlib==3.1.0 opt-einsum==3.2.1 包裝==20.4 pamela==1.0.0 pandas==1.0.4 pandocfilters==1.4.2 parso==0.7.0 pex==2.1。 1 pexpect==4.8.0 pickleshare==0.7.5 prometheus-client==0.7.1 prompt-toolkit==3.0.5 protobuf==3.12.2 ptyprocess==0.6.0 py4j==0.10.7 pyarrow== 1.0.0 pyasn1==0.4.8 pyasn1-modules==0.2 .8 pycparser==2.20 Pygments==2.6.1 pyOpenSSL==19.1.0 pyparsing==2.4.7 pyrsistent==0.16.0 pyspark==2.4.6 python-dateutil==2.8.1 python-editor==1.0 .4 python-json-logger==0.1.11 pytz==2020.1 PyYAML==5.3.1 pyzmq==19.0.1 requests==2.23.0 requests-oauthlib==1.3.0 rsa==4.6 ruamel.yaml= =0.16.10 ruamel.yaml.clib==0.2.0 scikit-learn==0.23.1 scipy==1.4.1 seaborn==0.10.1 Send2Trash==1.5.0 六==1.15.0 skein==0.8 .0 sklearn==0.0 SQLAlchemy==1.3.17 tbb==2019.0 tbb4py==2019.0 tensorboard==2.2.2 tensorboard-plugin-wit==1.7.0 tensorflow==2.2.0 tensorflow-estimator==2.2.0 tensorflowonspark==2.2.1 termcolor==1.1.0 terminado==0.8.3 testpath==0.4.4 tf-yarn==0.5.1 threadpoolctl==2.1.0 tornado==6.0.4 traitlets==4.3.3 urllib3==1.25.9 wcwidth==0.1.9 webencodings==0.5.1 Werkzeug==1.0.1 widgetsnbextension==3.5.1 wrapt==1.12.1 zipp==3.1.0 失敗,120

使用您的依賴項之一創建 pex 失敗的原因。 你真的有很多依賴。 最好的辦法是為您擁有的每個用例隔離您的依賴項並創建一個較小的虛擬環境,或者僅使用 tensorflow 進行嘗試。

您可以嘗試執行具有不同要求的 pex cli 命令,並查看究竟是哪個要求造成了問題。 同樣重要的是要檢查它是否適用於更新的 pex 版本(當前 tf-yarn 使用 pex==2.1.1)

pex -r requirements -o myarchive.pex

作為替代方案,您也可以嘗試它是否適用於 conda。

如果您創建一個包含絕對需要的需求的 requirements.txt 文件並在https://github.com/criteo/tf-yarn/issues中輸入問題,我可以看看。

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM