简体   繁体   中英

Updating jar job on databricks

I have a shared cluster which is used by more than several jobs on databricks. the update of the jar corresponding to the job is not used when I launch the execution of the job, on cluster, I see that it uses an old version of the jar.

to clarify, I publish the jar through API 2.0 in databricks.

my question why when i start the execution of my Job, the execution on the cluster always uses an old version. Thank you for you helping

Old jar will be removed from the cluster only when it's terminated. If you have a shared cluster that never terminates, then it doesn't happen. This a limitation not of the Databricks but Java that can't unload classes that are already in use (or it's very hard to implement reliably).

For most of cases it's really not recommended to use shared cluster, for several reasons:

  • it costs significantly more (~4x)
  • tasks are affecting each other from performance point of view
  • there is a high probability of dependencies conflicts + inability of updating libraries without affecting other tasks
  • there is a kind of "garbage" collected on the driver nodes
  • ...

If you use shared cluster to get faster execution, I recommend to look onto Instance Pools , especially in combination of preloading of Databricks Runtime onto nodes in instance pool.

Very weird, I think it is not a production ready if there is no way to have multiple concurrent jars inside cluster, loading one jar fot a job is mandatory and convenient to use like this, unfortunely there are nos ways to overcome this issue

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM