简体   繁体   中英

Databricks + H2O PySparkling: addURL Py4JException

I am a newbie to H2O and spark framework and I am having troubles with on boarding H2O+Spark (sparkling-water) PySparkling in Databricks. I have a 12 worker cluster running in Databricks in 1.5.2 environment.

Steps I took were as following:
1. Attach (Installed) necessary libraries (six, requests, tabulate, and future) required by H2O to my cluster

  1. Then, I took the necessary .egg file from sparkling-water-1.5.14/py/dist folder after unzipping it from the sparkling-water-1.5.14.zip package.

  2. I also attached the sparkling-water-assembly-1.5.14.jar to my Databricks cluster

  3. I am able to import h2o successfully. however, when I run the following cell in my python NB in Databricks, I am getting exception below:

    Initiate H2OContext on top of Spark
    from pysparkling import * hc = H2OContext(sc).start() import h2o

I am getting following error

py4j.Py4JException: Method addURL([class java.net.URL]) does not exist

Sincerely appreciate any guidance on how to resolve this exception.

This is a bug in PySparkling . A fix has been already committed but is still waiting for the next release, might be introduced in 1.5.15.

You can try building Sparkling Water from that branch yourself and use that before we release the next version.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM