I have written an Azure Databricks scala notebook (based on a JAR library), and I run it using a Databricks job once every hour.
In the code, I use the Application Insights Java SDK for log tracing, and init a GUID that marks the "RunId". I do this in a Scala 'object' constructor:
object AppInsightsTracer
{
TelemetryConfiguration.getActive().setInstrumentationKey("...");
val tracer = new TelemetryClient();
val properties = new java.util.HashMap[String, String]()
properties.put("RunId", java.util.UUID.randomUUID.toString);
def trackEvent(name: String)
{
tracer.trackEvent(name, properties, null)
}
}
The notebook itself simply calls the code in the JAR:
import com.mypackage._
Flow.go()
I expect to have a different "RunId" every hour. The weird behavior I am seeing is that for all runs, I get exactly the same "RunId" in the logs! As if the Scala object constructor code is run exactly once, and is re-used between notebook runs...
Do Spark/Databricks notebooks retain context between runs? If so how can this be avoided?
A Jupyter notebook spawns a Spark session (think of it as a process) and keeps it alive until it either dies, or you restart it explicitly. The object is a singleton, so it's initialized once and will be the same for all cell executions of the notebook.
You start with a new context every time you refresh the notebook.
I would recommend saving your RunId to a file to disk, then reading that file on every notebook run and then increment the RunId in the file.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.