Unable to run PySpark in Jupyter Notebook - Linux

I'm trying to run PySpark on my Jupyter Notebook locally on a server not connected to the internet. I installed PySpark and Java using the following:

conda install pyspark-3.3.0-pyhd8ed1ab_0.tar.bz2
conda install openjdk-8.0.332-h166bdaf_0.tar.bz2

When I do a !java -version in my notebook, I get

openjdk version "1.8.0_332"
OpenJDK Runtime Environment (Zulu (build 1.8.0_332-b09)
OpenJDK 64-Bit Server VM (Zulu (build 25.332-b09, mixed mode)

When I run !which java , I get


My code is as follows.

import os
os.environ['SPARK_HOME'] = "/root/anaconda3/pkgs/pyspark-3.3.0-pyhd8ed1ab_0/site_packages/pyspark"
os.environ['JAVA_HOME'] = "/root/anaconda3"
os.environ['PYSPARK_SUBMIT_ARGS'] = "--master local[2] pyspark-shell"

from pyspark import SparkConf, SparkContext
conf = SparkConf().set('spark.driver.host','')
sc = SparkContext(master='local', appName='Test', conf=conf)

The error I got was (a snippet of it because I'm manually typing it here):

Exception in thread "main" java.lang.ExceptionInInitializerError
    at org.apache.spark.deploy.SparkSubmitArguments.$anonfun$loadEnvironmentArguments$3(SparkSubmitArguments.scala:157)
Caused by: java.net.UnknownHostException: abc: abc: Name or service not known
Caused by: java.net.UnknownHostException: abc: Name or service not known

Runtime Error: Java gateway process exited before sending its port number

"abc" is my server's hostname. What am I missing here?

I found out what the problem was.

Based on the error message java.net.UnknownHostException: abc: abc: Name or service not known , I suspected Java did not recognize my server hostname abc . So I added it to /etc/hosts under the loopback IP , and now I can run pyspark.

