I am a newbie to H2O and spark framework and I am having troubles with on boarding H2O+Spark (sparkling-water)
PySparkling in Databricks. I have a 12 worker cluster running in Databricks in 1.5.2 environment.
Steps I took were as following:
1. Attach (Installed) necessary libraries (six, requests, tabulate, and future) required by H2O to my cluster
Then, I took the necessary .egg file from sparkling-water-1.5.14/py/dist
folder after unzipping it from the sparkling-water-1.5.14.zip package.
I also attached the sparkling-water-assembly-1.5.14.jar
to my Databricks cluster
I am able to import h2o
successfully. however, when I run the following cell in my python NB in Databricks, I am getting exception below:
Initiate H2OContext on top of Spark
from pysparkling import * hc = H2OContext(sc).start() import h2o
I am getting following error
py4j.Py4JException: Method addURL([class java.net.URL]) does not exist
Sincerely appreciate any guidance on how to resolve this exception.
This is a bug in PySparkling . A fix has been already committed but is still waiting for the next release, might be introduced in 1.5.15.
You can try building Sparkling Water from that branch yourself and use that before we release the next version.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.