简体   繁体   中英

How do i have a JAVA_HOME set using a Dockerfile and python?

I am trying to set up a Dockerfile for my project and am unsure how to set a JAVA_HOME within the container.

FROM python:3.6
# Set the working directory to /app
WORKDIR /app
# Copy the current directory contents into the container at /app
COPY . /app
# Install any needed packages specified in requirements.txt
RUN pip install --trusted-host pypi.python.org -r requirements.txt
# Define environment variable
ENV NAME Netflow
# Run netflow.py
CMD ["python", "netflow.py"]

In the requirements.txt I have...

numpy
pandas
kafka
pyspark
log

My netflow.py file is...

import pandas, math, re, log
from pyspark import SparkConf, SparkContext
from pyspark.sql import SQLContext


conf = SparkConf().setAppName("building a warehouse")
sc = SparkContext(conf=conf)
df=pandas.read_csv(r'TestDataSet.csv') 

The output in the terminal after trying to run it is....

JAVA_HOME is not set
Traceback (most recent call last):
  File "netflow.py", line 7, in <module>
    sc = SparkContext(conf=conf)
  File "/usr/local/lib/python3.6/site-packages/pyspark/context.py", line 115, in __init__
    SparkContext._ensure_initialized(self, gateway=gateway, conf=conf)
  File "/usr/local/lib/python3.6/site-packages/pyspark/context.py", line 298, in _ensure_initialized
    SparkContext._gateway = gateway or launch_gateway(conf)
  File "/usr/local/lib/python3.6/site-packages/pyspark/java_gateway.py", line 94, in launch_gateway
    raise Exception("Java gateway process exited before sending its port number")

I have been looking for a solution but none have worked so far.

I have tried

ENV JAVA_HOME /Library/Java/JavaVirtualMachines/openjdk-11.jdk/Contents/Home

and I have tried using a separate command

docker run -e "JAVA_HOME=/Library/Java/JavaVirtualMachines/openjdk-11.jdk/Contents/Home" project env

I am still getting the same error

You need to actually install Java inside your container, but I would suggest rather finding a Pyspark docker image, or adding Python to the Openjdk images so that you don't need to mess with too many environment variables

More specifically, JAVA_HOME=/Library/Java/JavaVirtualMachines is a only available as a path to your Mac, and shouldn't be expected to work inside a Linux container

However, it's not clear why you need Pyspark when numpy is the only thing actually reading your data

To set environment variables, you can declare them in your dockerfile like so:

ENV JAVA_HOME="foo"

or

ENV JAVA_HOME foo

In fact, you already set an environment variable in the example you posted.

See documentation for more details.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM