簡體   English   中英

為 jupyter spark notebook 構建 docker 圖像時出錯

[英]Error when building docker image for jupyter spark notebook

我正在嘗試按照此處的指南在 docker 中構建 Jupyter 筆記本: https://github.com/cordon-thiago/airflow-spark並遇到退出代碼錯誤:8。我跑了:

$ docker build --rm --force-rm -t jupyter/pyspark-notebook:3.0.1 .

建築物停在代碼處:

RUN wget -q $(wget -qO- https://www.apache.org/dyn/closer.lua/spark/spark-${APACHE_SPARK_VERSION}/spark-${APACHE_SPARK_VERSION}-bin-hadoop${HADOOP_VERSION}.tgz\?as_json | \
    python -c "import sys, json; content=json.load(sys.stdin); print(content['preferred']+content['path_info'])") && \
    echo "${spark_checksum} *spark-${APACHE_SPARK_VERSION}-bin-hadoop${HADOOP_VERSION}.tgz" | sha512sum -c - && \
    tar xzf "spark-${APACHE_SPARK_VERSION}-bin-hadoop${HADOOP_VERSION}.tgz" -C /usr/local --owner root --group root --no-same-owner && \
    rm "spark-${APACHE_SPARK_VERSION}-bin-hadoop${HADOOP_VERSION}.tgz"

錯誤消息如下:


 => ERROR [4/9] RUN wget -q $(wget -qO- https://www.apache.org/dyn/closer.lua/spark/spark-3.0.1/spark-3.0.1-bin-hadoop2.7.tgz?as_json |     python -c "import sys, json; content=json.load(sys.stdin);   2.3s
------
 > [4/9] RUN wget -q $(wget -qO- https://www.apache.org/dyn/closer.lua/spark/spark-3.0.1/spark-3.0.1-bin-hadoop2.7.tgz?as_json |     python -c "import sys, json; content=json.load(sys.stdin); print(content[
'preferred']+content['path_info'])") &&     echo "F4A10BAEC5B8FF1841F10651CAC2C4AA39C162D3029CA180A9749149E6060805B5B5DDF9287B4AA321434810172F8CC0534943AC005531BB48B6622FBE228DDC *spark-3.0.1-bin-hadoop2.7.
tgz" | sha512sum -c - &&     tar xzf "spark-3.0.1-bin-hadoop2.7.tgz" -C /usr/local --owner root --group root --no-same-owner &&     rm "spark-3.0.1-bin-hadoop2.7.tgz":
------
executor failed running [/bin/bash -o pipefail -c wget -q $(wget -qO- https://www.apache.org/dyn/closer.lua/spark/spark-${APACHE_SPARK_VERSION}/spark-${APACHE_SPARK_VERSION}-bin-hadoop${HADOOP_VERSION}.tgz\
?as_json |     python -c "import sys, json; content=json.load(sys.stdin); print(content['preferred']+content['path_info'])") &&     echo "${spark_checksum} *spark-${APACHE_SPARK_VERSION}-bin-hadoop${HADOOP_
VERSION}.tgz" | sha512sum -c - &&     tar xzf "spark-${APACHE_SPARK_VERSION}-bin-hadoop${HADOOP_VERSION}.tgz" -C /usr/local --owner root --group root --no-same-owner &&     rm "spark-${APACHE_SPARK_VERSION}
-bin-hadoop${HADOOP_VERSION}.tgz"]: exit code: 8

如果有人能在這方面啟發我,我真的很感激。 謝謝!

退出代碼 8 可能來自wget這意味着來自服務器的錯誤響應。 As an example, this path that the Dockerfile tries to wget from isn't valid anymore: https://www.apache.org/dyn/closer.lua/spark/spark-3.0.1/spark-3.0.1-bin -hadoop2.7.tgz

從 repo 上的問題來看, Apache 版本 3.0.1 似乎不再有效,因此您應該使用--build-arg將 APACHE_SPARK 版本覆蓋為 3.0.2:

docker build --rm --force-rm \
  --build-arg spark_version=3.0.2 \
  -t jupyter/pyspark-notebook:3.0.2 .

編輯

有關更多信息,請參閱下面的評論,有效的命令是:

docker build --rm --force-rm \
  --build-arg spark_version=3.1.1 \
  --build-arg hadoop_version=2.7 \
  -t jupyter/pyspark-notebook:3.1.1 .  

並更新了 spark 校驗和以反映 3.1.1 的版本: https://downloads.apache.org/spark/spark-3.1.1/spark-3.1.1-bin-hadoop2.7.tgz.sha512

為了使這個答案在未來具有相關性,它可能需要為最新的 spark/hadoop 版本再次更新版本和校驗和。

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM