简体   繁体   English

数据块中的 addSparkListener 等效项

[英]addSparkListener equivalent in databricks

I want to register custom SparkListener with Databricks' spark context.我想用 Databricks 的 spark 上下文注册自定义 SparkListener。

With basic spark i can just use "spark.jars" and "spark.extraListeners" configs during spark-submit.有了基本的 spark,我可以在 spark-submit 期间使用“spark.jars”“spark.extraListeners”配置。 OR use sparkContext.addSparkListener api.或使用sparkContext.addSparkListener api。

For databricks setup,I have installed the jar containing listener on my cluster.对于数据块设置,我在集群上安装了包含监听器的 jar。 When I put the config "spark.extraListeners" in "advanced" config tab of the cluster, cluster fails to initialize throwing error Listener not found.当我将配置“spark.extraListeners”放在集群的“高级”配置选项卡中时,集群无法初始化抛出错误侦听器未找到。

I tried setting it during sparksession builder like我尝试在 sparksession builder 中设置它

    .builder \
    .appName("abc") \
    .config("spark.extraListeners","mySparkListener") \
    .enableHiveSupport() \
    .getOrCreate()

databricks wont add it. databricks 不会添加它。 No errors thrown but listener is not added.没有抛出错误,但没有添加监听器。

Is there any way to do this?有没有办法做到这一点? Note: I am using python notebooks on databricks注意:我在 databricks 上使用 python 笔记本

The problem is that when you get into the notebook, SparkSession is already initialized, so your configuration doesn't have an effect.问题是当你进入 notebook 时, SparkSession已经初始化了,所以你的配置没有效果。

You need to have this setting specified when cluster is starting - you did it correctly by specifying in the cluster Spark conf settings, but the problem is that libraries are installed after Spark is started, and necessary classes aren't found.您需要在集群启动时指定此设置 - 您通过在集群 Spark conf 设置中指定正确地做到了这一点,但问题是在 Spark 启动后安装了库,并且找不到必要的类。 You can fix this by adding a cluster init script , something like this - you need to have you library installed somewhere on DBFS (I use /FileStore/jars/my_jar.jar as an example):你可以通过添加一个集群初始化脚本来解决这个问题,就像这样 - 你需要在 DBFS 的某个地方安装你的库(我使用/FileStore/jars/my_jar.jar作为示例):

#!/bin/bash

cp /dbfs/FileStore/jars/my_jar.jar /databricks/jars

this script will copy your jar file into the directory with jars on the local disk, and this will happen before Spark starts.此脚本会将您的 jar 文件复制到本地磁盘上 jars 的目录中,这将在 Spark 启动之前发生。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM