简体   繁体   中英

How to run Mosaic locally (outside Databricks)

I am using Mosaic on Databricks to do some geospatial transformations, and so far it works well.

As my code base starts growing, I am looking for a way to run unit tests on the geospatial transformations on my local machine. However, I could not have Mosaic work locally. Here is a minimal example for reproducing the error I get:

from mosaic import enable_mosaic
from pyspark.sql import SparkSession

spark_session = (
    SparkSession.builder.master("local[*]").appName("mylib-tests").getOrCreate()
)
enable_mosaic(spark_session)  # <- Error here

Here is full log for running enable_mosaic(spark_session) including the error I get:

22/05/18 12:00:23 INFO MosaicLibraryHandler: Looking for Mosaic JAR at /home/nicolas/.pyenv/versions/myenv/lib/python3.9/site-packages/mosaic/lib/mosaic-0.1.1-jar-with-dependencies.jar.
22/05/18 12:00:23 INFO MosaicLibraryHandler: Automatically attaching Mosaic JAR to cluster.
Traceback (most recent call last):
  File "/home/nicolas/.pyenv/versions/myenv/lib/python3.9/site-packages/IPython/core/interactiveshell.py", line 3437, in run_code
    exec(code_obj, self.user_global_ns, self.user_ns)
  File "<ipython-input-4-e83e439e0633>", line 1, in <module>
    enable_mosaic(spark_session)
  File "/home/nicolas/.pyenv/versions/myenv/lib/python3.9/site-packages/mosaic/api/enable.py", line 47, in enable_mosaic
    _ = MosaicLibraryHandler(config.mosaic_spark)
  File "/home/nicolas/.pyenv/versions/myenv/lib/python3.9/site-packages/mosaic/core/library_handler.py", line 29, in __init__
    self.auto_attach()
  File "/home/nicolas/.pyenv/versions/myenv/lib/python3.9/site-packages/mosaic/core/library_handler.py", line 76, in auto_attach
    ManagedLibraryId.defaultOrganization(),
TypeError: 'JavaPackage' object is not callable

I guess something differs between my local setup and the environment on Databricks, but I could not find what is missing. Has anyone managed to get Mosaic to work outside Databricks?

I finally managed to make it work.

The problem disappeared when I changed the configuration of the spark session like below:

spark_session = (
    SparkSession.builder.master("local[*]")
    .config(
        "spark.jars",
        f"{os.environ['VIRTUAL_ENV']}/lib/python3.9/site-packages/"
        "mosaic/lib/mosaic-0.1.1-jar-with-dependencies.jar",
    )
    .config("spark.databricks.labs.mosaic.jar.autoattach", False)
    .appName("mylib-tests")
    .getOrCreate()
)
enable_mosaic(spark_session)  # works

I hope this can help others to at least have something that works, although I find this not really clean to have a reference to a jar file in a python virtual environment... If you have a better solution please feel free to leave another answer.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM