Databricks Spark notebook re-using Scala objects between runs?

Question

I have written an Azure Databricks scala notebook (based on a JAR library), and I run it using a Databricks job once every hour.

In the code, I use the Application Insights Java SDK for log tracing, and init a GUID that marks the "RunId". I do this in a Scala 'object' constructor:

object AppInsightsTracer
{
  TelemetryConfiguration.getActive().setInstrumentationKey("...");
  val tracer = new TelemetryClient();
  val properties = new java.util.HashMap[String, String]()
  properties.put("RunId", java.util.UUID.randomUUID.toString);

  def trackEvent(name: String)
  {
    tracer.trackEvent(name, properties, null)
  }
}

The notebook itself simply calls the code in the JAR:

import com.mypackage._
Flow.go()

I expect to have a different "RunId" every hour. The weird behavior I am seeing is that for all runs, I get exactly the same "RunId" in the logs! As if the Scala object constructor code is run exactly once, and is re-used between notebook runs...

Do Spark/Databricks notebooks retain context between runs? If so how can this be avoided?

Answer 1

A Jupyter notebook spawns a Spark session (think of it as a process) and keeps it alive until it either dies, or you restart it explicitly. The object is a singleton, so it's initialized once and will be the same for all cell executions of the notebook.

Answer 2

You start with a new context every time you refresh the notebook.

I would recommend saving your RunId to a file to disk, then reading that file on every notebook run and then increment the RunId in the file.

Databricks Spark notebook re-using Scala objects between runs?

Question

2 answers

solution1
2 ACCPTED 2019-07-21 11:29:40

solution2
0 2018-08-06 16:17:34

Databricks Spark notebook re-using Scala objects between runs?

Question

2 answers

solution1 2 ACCPTED 2019-07-21 11:29:40

solution2 0 2018-08-06 16:17:34

solution1
2 ACCPTED 2019-07-21 11:29:40

solution2
0 2018-08-06 16:17:34