简体   繁体   中英

Trigger Azure databricks when blob changes

I am parsing the files from Azure blob storage using spark in Azure databricks. The blob is mounted as dbfs. Right now I am doing it in a notebook, using hardcoded file name(dbfs file name). But I want to trigger the notebook with the new dbfs name whenever a new blob is created. I checked using Azure functions I can get a blob trigger. Can I start a databricks notebook/job from Azure functions? The operations on blob takes quite some time. Is it advisable to use azure functions in such cases. Or is there some other way to achieve this.

As Parth Deb says, use azure datafactory will be easier for your requirement.

You just need to create a trigger of your pipeline and then create a event trigger based on 'blob created' to trigger the databricks activity. You just need to pass parameters.

This is a built-in function of the factory, you can check the documentation:

https://docs.microsoft.com/en-us/azure/data-factory/concepts-pipelines-activities

https://docs.microsoft.com/en-us/azure/data-factory/transform-data-databricks-notebook

https://docs.microsoft.com/en-us/azure/data-factory/how-to-expression-language-functions

You can look at the above document. In the end, you basically only need some mouse operations.

I ended up using ADF. I created a new pipeline with Blob triggers that were triggered based on the file names.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM