[英]Unable to run docker image with findspark.init
I've created a docker image of a program that has the findspark.init()
function in it.我创建了一个包含
findspark.init()
function 的程序的 docker 映像。 The program runs well on the local machine.该程序在本地机器上运行良好。 When I try to run the image with
docker run -p 5000:5000 imgname:latest
, I get the following error:当我尝试使用
docker run -p 5000:5000 imgname:latest
映像时,出现以下错误:
Traceback (most recent call last):
File "app.py", line 37, in <module>
findspark.init()
File "/usr/local/lib/python3.8/site-packages/findspark.py", line 129, in init
spark_home = find()
File "/usr/local/lib/python3.8/site-packages/findspark.py", line 35, in find
raise ValueError(
ValueError: Couldn't find Spark, make sure SPARK_HOME env is set or Spark is in an expected location (e.g. from homebrew installation).
Can anyone suggest a way around this problem?任何人都可以提出解决这个问题的方法吗? When I try to make the program without the findspark function, I'm getting other errors related to Spark.
当我尝试在没有 findspark function 的情况下制作程序时,我遇到了与 Spark 相关的其他错误。 This is my dockerfile:
这是我的 dockerfile:
#Use python as base image
FROM python:3.8
#Use working dir app
WORKDIR /app
#Copy contents of current dir to /app
ADD . /app
#Install required packages
RUN pip3 install --upgrade pip
RUN pip3 install -r requirements.txt
#Open port 5000
EXPOSE 5000
#Set environment variable
ENV NAME analytic
#Run python program
CMD python app.py
Here is the part of the code where the image is stalling:这是图像停止的代码部分:
### multiple lines of importing libraries and then
# Spark imports
import findspark
findspark.init()
import pyspark
from pyspark.sql import SparkSession
from pyspark.sql.functions import *
from pyspark.sql.types import *
from pyspark.sql import functions as F
The requirements.txt file can be seen on thislink .可以在此链接上看到 requirements.txt 文件。
Spark requires Java even if you're running pyspark, so you need to install java in your image. Spark 需要 Java,即使你运行的是 pyspark,所以你需要在你的镜像中安装 java。 In addition, if you're still using
findspark
you can specify the SPARK_HOME
directory as well:此外,如果您仍在使用
findspark
,您也可以指定SPARK_HOME
目录:
RUN apt-get update && apt-get install -y default-jre
ENV SPARK_HOME /usr/local/lib/python3.8/site-packages/pyspark
To summarize, your Dockerfile
should look like:总而言之,您的
Dockerfile
应如下所示:
#Use python as base image
FROM python:3.8
RUN apt-get update && apt-get install -y default-jre
#Use working dir app
WORKDIR /app
#Copy contents of current dir to /app
ADD . /app
#Install required packages
RUN pip3 install --upgrade pip
RUN pip3 install -r requirements.txt
#Open port 5000
EXPOSE 5000
#Set environment variable
ENV NAME analytic
ENV SPARK_HOME /usr/local/lib/python3.8/site-packages/pyspark
#Run python program
CMD python app.py
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.