In databricks how to automate notebook runs

Question

I have multiple datasets that are updated inconsistently in databricks: datasets database.A , database.B , database.C .

database.A : is updated the first of every month (ie 1/1/2022, 2/1/2022, etc.), but sometimes has midsession updates (ie 3/14/2022, 4/12/2022, etc.)
database.B : is updated the fifth of every month
database.C : is updated the first of every quarter (ie 1/1/2022, 4/1/2022, etc.), but sometimes has a midsession update (ie 5/1/2022, etc.)

My goal is to create a notebook that runs processes when the data is updated in any of these datasets. For example:

data.updated.A <- some_code_or_function(database.A)
data.updated.B <- some_code_or_function(database.B)
data.updated.C <- some_code_or_function(database.C)

case when data.updated.A = TRUE or data.updated.B = TRUE or data.updated.C = TRUE then run_notebook else do_nothing_and_send_signal_1_day_from_now

Any ideas? Full disclosure, I am relatively new to databricks so I may not know if I need to switch from SQL to scala, python, or R and am fully willing to. Should I consider another tactic besides scheduled processes?Thanks.

Answer 1

You can run the notebook as a job and run it based on corn: https://docs.databricks.com/jobs.html#create-a-job

If you are deploying your notebooks using Terraform you can this module that I wrote: https://github.com/tomarv2/terraform-databricks-workspace-management

In databricks how to automate notebook runs

Question

1 answers

solution1
0 2022-04-14 02:59:30

In databricks how to automate notebook runs

Question

1 answers

solution1 0 2022-04-14 02:59:30

solution1
0 2022-04-14 02:59:30