I have multiple datasets that are updated inconsistently in databricks: datasets database.A
, database.B
, database.C
.
database.A
: is updated the first of every month (ie 1/1/2022, 2/1/2022, etc.), but sometimes has midsession updates (ie 3/14/2022, 4/12/2022, etc.) database.B
: is updated the fifth of every month database.C
: is updated the first of every quarter (ie 1/1/2022, 4/1/2022, etc.), but sometimes has a midsession update (ie 5/1/2022, etc.) My goal is to create a notebook that runs processes when the data is updated in any of these datasets. For example:
data.updated.A <- some_code_or_function(database.A)
data.updated.B <- some_code_or_function(database.B)
data.updated.C <- some_code_or_function(database.C)
case when data.updated.A = TRUE or data.updated.B = TRUE or data.updated.C = TRUE then run_notebook else do_nothing_and_send_signal_1_day_from_now
Any ideas? Full disclosure, I am relatively new to databricks so I may not know if I need to switch from SQL to scala, python, or R and am fully willing to. Should I consider another tactic besides scheduled processes?Thanks.
You can run the notebook as a job and run it based on corn: https://docs.databricks.com/jobs.html#create-a-job
If you are deploying your notebooks using Terraform you can this module that I wrote: https://github.com/tomarv2/terraform-databricks-workspace-management
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.