I am using Mosaic on Databricks to do some geospatial transformations, and so far it works well.
As my code base starts growing, I am looking for a way to run unit tests on the geospatial transformations on my local machine. However, I could not have Mosaic work locally. Here is a minimal example for reproducing the error I get:
from mosaic import enable_mosaic
from pyspark.sql import SparkSession
spark_session = (
SparkSession.builder.master("local[*]").appName("mylib-tests").getOrCreate()
)
enable_mosaic(spark_session) # <- Error here
Here is full log for running enable_mosaic(spark_session)
including the error I get:
22/05/18 12:00:23 INFO MosaicLibraryHandler: Looking for Mosaic JAR at /home/nicolas/.pyenv/versions/myenv/lib/python3.9/site-packages/mosaic/lib/mosaic-0.1.1-jar-with-dependencies.jar.
22/05/18 12:00:23 INFO MosaicLibraryHandler: Automatically attaching Mosaic JAR to cluster.
Traceback (most recent call last):
File "/home/nicolas/.pyenv/versions/myenv/lib/python3.9/site-packages/IPython/core/interactiveshell.py", line 3437, in run_code
exec(code_obj, self.user_global_ns, self.user_ns)
File "<ipython-input-4-e83e439e0633>", line 1, in <module>
enable_mosaic(spark_session)
File "/home/nicolas/.pyenv/versions/myenv/lib/python3.9/site-packages/mosaic/api/enable.py", line 47, in enable_mosaic
_ = MosaicLibraryHandler(config.mosaic_spark)
File "/home/nicolas/.pyenv/versions/myenv/lib/python3.9/site-packages/mosaic/core/library_handler.py", line 29, in __init__
self.auto_attach()
File "/home/nicolas/.pyenv/versions/myenv/lib/python3.9/site-packages/mosaic/core/library_handler.py", line 76, in auto_attach
ManagedLibraryId.defaultOrganization(),
TypeError: 'JavaPackage' object is not callable
I guess something differs between my local setup and the environment on Databricks, but I could not find what is missing. Has anyone managed to get Mosaic to work outside Databricks?
I finally managed to make it work.
The problem disappeared when I changed the configuration of the spark session like below:
spark_session = (
SparkSession.builder.master("local[*]")
.config(
"spark.jars",
f"{os.environ['VIRTUAL_ENV']}/lib/python3.9/site-packages/"
"mosaic/lib/mosaic-0.1.1-jar-with-dependencies.jar",
)
.config("spark.databricks.labs.mosaic.jar.autoattach", False)
.appName("mylib-tests")
.getOrCreate()
)
enable_mosaic(spark_session) # works
I hope this can help others to at least have something that works, although I find this not really clean to have a reference to a jar file in a python virtual environment... If you have a better solution please feel free to leave another answer.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.