简体繁体中英

Using SQL Stored Procedure vs Databricks in Azure Data Factory

原文 2020-09-25 09:33:14 4 2 azure/ azure-sql-database

I have a requirement to write upto 500k records daily to Azure SQL DB using an ADF pipeline. I had simple calculations as part of the data transformation that can performed in a SQL Stored procedure activity. I've also observed Databricks Notebooks being used commonly, esp. due to benefits of scalability going forward. But there is an overhead activity of placing files in another location after transformation, managing authentication etc. and I want to avoid any over-engineering unless absolutely required. I've tested SQL Stored Proc and it's working quite well for ~50k records (not yet tested with higher volumes).

But I'd still like to know the general recommendation between the 2 options, esp. from experienced Azure or data engineers. Thanks

2 answers

As an experienced (former) DBA, Data Engineer and data architect, I cannot see what Databricks adds in this situation. This piece of the architecture you might need to scale is the target for the INSERTs , ie Azure SQL Database which is ridiculously easy to scale either manually via the portal or via the REST API, if even required. Consider techniques such as loading into heaps and partition switching if you need to tune the insert.

The overhead of adding an additional component to your architecture and then taking your data through would have to be worth it, plus the additional cost of spinning up Spark clusters at the same time your db is running.

Databricks is a superb tool and has a number of great use cases, eg advanced data transforms (ie things you cannot do with SQL), machine learning, streaming and others. Have a look at this free resource for a few ideas:

https://databricks.com/p/ebook/the-big-book-of-data-science-use-cases

I'm not sure there is enough information to make a solid recommendation. What is the source of the data? Why is ADF part of the solution? Is this 500K rows once per day or a constant stream? Are you loading to a Staging table then using SPROC to move and transform the data to another table?

Here are a couple thoughts:

If the data operation is SQL to SQL [meaning the same SQL instance for both source and sink], then use Stored Procedures. This allows you to stay close to the metal and will perform the best. An exception would be if the computational load is really complicated, but that doesn't appear to be the case here.
Generally speaking, the only reason to call Data Bricks from ADF is if you already have that expertise and the resources already exist to support it.

Since ADF is part of the story, there is a middle ground between your two scenarios - Data Flows. Data Flows are a low-code abstraction over Data Bricks. They are ideal for in-flight data transforms and perform very well at high loads. You do not author or deploy notebooks, nor do you have to manage the Data Bricks configuration. And they are first class citizens in ADF pipelines.

Azure data factory and stored procedure

Azure SQL Data Factory Copy Activity with Sink Stored Procedure

Azure Data Factory copy activity with stored procedure

Invoking Oracle stored procedure in Azure Data Factory

MAP Nested JSON using rest API in Azure Data Factory using for each Iterator through a stored procedure to a SQL table

Azure Data Factory, Passing REST GET response to stored procedure in Azure SQL Database

Azure Synapse Stored Procedure from Azure Data Factory: is it a synchronous call?

How to execute U-SQL Stored Procedure In ADLA from Azure Data Factory V2

ADF: How to pass binary data to stored procedure in Azure Data Factory

How to sequence stored procedure process on Azure Data Factory?

暂无

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

Related Question Azure data factory and stored procedure Azure SQL Data Factory Copy Activity with Sink Stored Procedure Azure Data Factory copy activity with stored procedure Invoking Oracle stored procedure in Azure Data Factory MAP Nested JSON using rest API in Azure Data Factory using for each Iterator through a stored procedure to a SQL table Azure Data Factory, Passing REST GET response to stored procedure in Azure SQL Database Azure Synapse Stored Procedure from Azure Data Factory: is it a synchronous call? How to execute U-SQL Stored Procedure In ADLA from Azure Data Factory V2 ADF: How to pass binary data to stored procedure in Azure Data Factory How to sequence stored procedure process on Azure Data Factory?

Related Tags

Using SQL Stored Procedure vs Databricks in Azure Data Factory

Question

2 answers

solution1
1 2020-09-25 11:50:35

solution2
1 ACCPTED 2020-09-25 14:28:05

Using SQL Stored Procedure vs Databricks in Azure Data Factory

Question

2 answers

solution1 1 2020-09-25 11:50:35

solution2 1 ACCPTED 2020-09-25 14:28:05

solution1
1 2020-09-25 11:50:35

solution2
1 ACCPTED 2020-09-25 14:28:05