简体   繁体   English

错误为:-ModuleNotFoundError: No module named 'pyspark' While running Pyspark in docker

[英]Error as:-ModuleNotFoundError: No module named ‘pyspark’ While running Pyspark in docker

Getting the error as:得到错误为:

Traceback (most recent call last): File “/opt/application/main.py”, line 6, in from pyspark import SparkConf, SparkContext ModuleNotFoundError: No module named 'pyspark'回溯(最后一次调用):文件“/opt/application/main.py”,第 6 行,从 pyspark 导入 SparkConf,SparkContext ModuleNotFoundError:没有名为“pyspark”的模块

While running pyspark in docker.在 docker 中运行 pyspark 时。

And my dockerfile is as follows:而我的dockerfile如下:

FROM centos
ENV DAEMON_RUN=true
ENV SPARK_VERSION=2.4.7
ENV HADOOP_VERSION=2.7
WORKDIR /opt/application
RUN yum -y install python36
RUN yum -y install wget
ENV PYSPARK_PYTHON python3.6
ENV PYSPARK_DRIVER_PYTHON python3.6
RUN ln -s /usr/bin/python3.6 /usr/local/bin/python
RUN wget https://bootstrap.pypa.io/get-pip.py
RUN python get-pip.py
RUN pip3.6 install numpy
RUN pip3.6 install pandas
RUN wget --no-verbose http://apache.mirror.iphh.net/spark/spark-${SPARK_VERSION}/spark-${SPARK_VERSION}-bin-hadoop${HADOOP_VERSION}.tgz && tar -xvzf spark-${SPARK_VERSION}-bin-hadoop${HADOOP_VERSION}.tgz \
      && mv spark-${SPARK_VERSION}-bin-hadoop${HADOOP_VERSION} spark \
      && rm spark-${SPARK_VERSION}-bin-hadoop${HADOOP_VERSION}.tgz
ENV SPARK_HOME=/usr/local/bin/spark
RUN yum -y install java-1.8.0-openjdk
ENV JAVA_HOME /usr/lib/jvm/jre
COPY main.py .
RUN chmod +x /opt/application/main.py
CMD ["/opt/application/main.py"]

You forgot to install pyspark in your dockerfile.您忘记在 dockerfile 中安装pyspark

FROM centos
ENV DAEMON_RUN=true
ENV SPARK_VERSION=2.4.7
ENV HADOOP_VERSION=2.7
WORKDIR /opt/application
RUN yum -y install python36
RUN yum -y install wget
ENV PYSPARK_PYTHON python3.6
ENV PYSPARK_DRIVER_PYTHON python3.6
RUN ln -s /usr/bin/python3.6 /usr/local/bin/python
RUN wget https://bootstrap.pypa.io/get-pip.py
RUN python get-pip.py
RUN pip3.6 install numpy
RUN pip3.6 install pandas
RUN pip3.6 install pyspark  # add this line.
RUN wget --no-verbose http://apache.mirror.iphh.net/spark/spark-${SPARK_VERSION}/spark-${SPARK_VERSION}-bin-hadoop${HADOOP_VERSION}.tgz && tar -xvzf spark-${SPARK_VERSION}-bin-hadoop${HADOOP_VERSION}.tgz \
      && mv spark-${SPARK_VERSION}-bin-hadoop${HADOOP_VERSION} spark \
      && rm spark-${SPARK_VERSION}-bin-hadoop${HADOOP_VERSION}.tgz
ENV SPARK_HOME=/usr/local/bin/spark
RUN yum -y install java-1.8.0-openjdk
ENV JAVA_HOME /usr/lib/jvm/jre
COPY main.py .
RUN chmod +x /opt/application/main.py
CMD ["/opt/application/main.py"]

Edit : dockerfile improvment:编辑:dockerfile 改进:

FROM centos
ENV DAEMON_RUN=true
ENV SPARK_VERSION=2.4.7
ENV HADOOP_VERSION=2.7
WORKDIR /opt/application
RUN yum -y install python36 wget java-1.8.0-openjdk  # you could install python36 and wget in once
ENV PYSPARK_PYTHON python3.6
ENV PYSPARK_DRIVER_PYTHON python3.6
RUN ln -s /usr/bin/python3.6 /usr/local/bin/python
RUN wget https://bootstrap.pypa.io/get-pip.py \
    && python get-pip.py \
    && pip3.6 install numpy==1.19 pandas==1.1.5 pyspark==3.0.2  # you should also pin the version you need, pandas 1.2.x does not support python 3.6
RUN wget --no-verbose http://apache.mirror.iphh.net/spark/spark-${SPARK_VERSION}/spark-${SPARK_VERSION}-bin-hadoop${HADOOP_VERSION}.tgz && tar -xvzf spark-${SPARK_VERSION}-bin-hadoop${HADOOP_VERSION}.tgz \
      && mv spark-${SPARK_VERSION}-bin-hadoop${HADOOP_VERSION} spark \
      && rm spark-${SPARK_VERSION}-bin-hadoop${HADOOP_VERSION}.tgz
ENV SPARK_HOME=/usr/local/bin/spark
ENV JAVA_HOME /usr/lib/jvm/jre
COPY main.py .
RUN chmod +x /opt/application/main.py
CMD ["/opt/application/main.py"]

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 ModuleNotFoundError: 没有名为“pyspark”的模块 - ModuleNotFoundError: No module named 'pyspark' 查找“pyspark.worker”的模块规范时出错(ModuleNotFoundError:没有名为“pyspark”的模块) - Error while finding module specification for 'pyspark.worker' (ModuleNotFoundError: No module named 'pyspark') Pyspark UDF 获取错误 - ModuleNotFoundError:没有名为“sklearn”的模块 - Pyspark UDF getting error - ModuleNotFoundError: No module named 'sklearn' ModuleNotFoundError:运行 Django + Docker 项目时没有名为“djoser”的模块 - ModuleNotFoundError: No module named 'djoser' while running Django + Docker project 通过 vscode jupyter 服务器运行的 Jupyter Notebook 出现 ModuleNotFoundError: No module named from pyspark on Amazon EMR - Jupyter Notebook running through vscode jupyter server getting ModuleNotFoundError: No module named from pyspark on Amazon EMR 运行 celery worker 时出错:ModuleNotFoundError:没有名为“mysite”的模块 - Error while running celery worker : ModuleNotFoundError: No module named 'mysite' Docker 运行时出错:没有名为“pytz”的模块 - Docker error while running: no module named 'pytz' ModuleNotFoundError:运行 docker 映像时没有名为“turbodbc”的模块 - ModuleNotFoundError: No module named 'turbodbc' on running the docker image ModuleNotFoundError:使用 foreach function 和 PySpark 时没有名为 X 的模块 - ModuleNotFoundError: No module named X when using foreach function with PySpark Azure 数据块 PySpark 自定义 UDF ModuleNotFoundError:没有命名的模块 - Azure databricks PySpark custom UDF ModuleNotFoundError: No module named
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM