简体   繁体   中英

Azure Data Lake for Structured Data

We've been reviewing the Modern Data Warehouse architectures from Microsoft (link here) , which references using Azure Data Factory to pull structured and unstructured data into the Azure Data Lake. I've attended a lot of presentations on the subject as well, but most people are split on whether the Data Lake is a good home for structured data. What I am trying to determine is if importing data into the Data Lake is a good strategy if the only source we will be utilizing is on-prem SQL Server databases? And, what would be the advantage / disadvantages of that strategy?

For context sake, we're looking for a single pane of glass for consumption - whether it's end user's reporting with Power BI, or fodder for Azure Data Warehouse / on-prem Data Warehouse. We want one container that is the source for all of these systems, which is not the source OLTP system (ie OLTP database --> (Azure Data Factory) --> Data Lake --> everything else).

I appreciate any guidance on the subject. Thank you.

You have not mentioned the data size and I think for moving to ADL , the data is a very strong parameter . In your case the data is very much structured . If you we had unstructured & massive data and if you wanted to use ADB or Hadoop or any other technology to process it later , i think ADL is a good candidate .

You should also consider that the data is encrypted in motion using SSL .You can authorize users and groups with fine-grained POSIX-based ACLs for all data in the Store enabling role-based access controls .

The only real value in taking stuctured data, flattening it and loading it into a data lake is to save cost and decouple the data from any proprietary tool/compute. In your scenario, it will be less expensive to store the data in a data lake store vs. Azure SQL Database.

However, there is a complexity cost to flattening the data. You will need to restructure the data (ie. load it back into a database, or wrap logical structure) when you need to consume the data. Formats such as Parquet will help with this, but it is more complex for users to query data in a datalake than it is to connect to a relational database. Most all analysts and data consumers will know how to query a relational database, especially if the data is already in SQL Server.

Look at the volume of data and use cases for consumption to make that decision. A "logical datalake" can include both structured data in a relational database, semi structured data flattened in a storage account, and unstructured data saved to a storage account.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM