简体   繁体   English

无法使用 findspark.init 运行 docker 映像

[英]Unable to run docker image with findspark.init

I've created a docker image of a program that has the findspark.init() function in it.我创建了一个包含findspark.init() function 的程序的 docker 映像。 The program runs well on the local machine.该程序在本地机器上运行良好。 When I try to run the image with docker run -p 5000:5000 imgname:latest , I get the following error:当我尝试使用docker run -p 5000:5000 imgname:latest映像时,出现以下错误:

Traceback (most recent call last):
  File "app.py", line 37, in <module>
    findspark.init()
  File "/usr/local/lib/python3.8/site-packages/findspark.py", line 129, in init
    spark_home = find()
  File "/usr/local/lib/python3.8/site-packages/findspark.py", line 35, in find
    raise ValueError(
ValueError: Couldn't find Spark, make sure SPARK_HOME env is set or Spark is in an expected location (e.g. from homebrew installation).

Can anyone suggest a way around this problem?任何人都可以提出解决这个问题的方法吗? When I try to make the program without the findspark function, I'm getting other errors related to Spark.当我尝试在没有 findspark function 的情况下制作程序时,我遇到了与 Spark 相关的其他错误。 This is my dockerfile:这是我的 dockerfile:

#Use python as base image
FROM python:3.8

#Use working dir app
WORKDIR /app

#Copy contents of current dir to /app
ADD . /app

#Install required packages
RUN pip3 install --upgrade pip
RUN pip3 install -r requirements.txt

#Open port 5000
EXPOSE 5000

#Set environment variable
ENV NAME analytic

#Run python program
CMD python app.py

Here is the part of the code where the image is stalling:这是图像停止的代码部分:

    ### multiple lines of importing libraries and then    
    # Spark imports
    import findspark
    findspark.init()
    
    import pyspark
    from pyspark.sql import SparkSession
    from pyspark.sql.functions import *
    from pyspark.sql.types import *
    from pyspark.sql import functions as F

The requirements.txt file can be seen on thislink .可以在此链接上看到 requirements.txt 文件。

Spark requires Java even if you're running pyspark, so you need to install java in your image. Spark 需要 Java,即使你运行的是 pyspark,所以你需要在你的镜像中安装 java。 In addition, if you're still using findspark you can specify the SPARK_HOME directory as well:此外,如果您仍在使用findspark ,您也可以指定SPARK_HOME目录:

RUN apt-get update && apt-get install -y default-jre
ENV SPARK_HOME /usr/local/lib/python3.8/site-packages/pyspark

To summarize, your Dockerfile should look like:总而言之,您的Dockerfile应如下所示:

#Use python as base image
FROM python:3.8

RUN apt-get update && apt-get install -y default-jre

#Use working dir app
WORKDIR /app

#Copy contents of current dir to /app
ADD . /app

#Install required packages
RUN pip3 install --upgrade pip
RUN pip3 install -r requirements.txt

#Open port 5000
EXPOSE 5000

#Set environment variable
ENV NAME analytic
ENV SPARK_HOME /usr/local/lib/python3.8/site-packages/pyspark

#Run python program
CMD python app.py

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM