简体   繁体   中英

addSparkListener equivalent in databricks

I want to register custom SparkListener with Databricks' spark context.

With basic spark i can just use "spark.jars" and "spark.extraListeners" configs during spark-submit. OR use sparkContext.addSparkListener api.

For databricks setup,I have installed the jar containing listener on my cluster. When I put the config "spark.extraListeners" in "advanced" config tab of the cluster, cluster fails to initialize throwing error Listener not found.

I tried setting it during sparksession builder like

    .builder \
    .appName("abc") \
    .config("spark.extraListeners","mySparkListener") \
    .enableHiveSupport() \
    .getOrCreate()

databricks wont add it. No errors thrown but listener is not added.

Is there any way to do this? Note: I am using python notebooks on databricks

The problem is that when you get into the notebook, SparkSession is already initialized, so your configuration doesn't have an effect.

You need to have this setting specified when cluster is starting - you did it correctly by specifying in the cluster Spark conf settings, but the problem is that libraries are installed after Spark is started, and necessary classes aren't found. You can fix this by adding a cluster init script , something like this - you need to have you library installed somewhere on DBFS (I use /FileStore/jars/my_jar.jar as an example):

#!/bin/bash

cp /dbfs/FileStore/jars/my_jar.jar /databricks/jars

this script will copy your jar file into the directory with jars on the local disk, and this will happen before Spark starts.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM