简体繁体中英

Difference between Delta Lake and Lake Database in Azure Synapse

原文 2022-08-31 07:42:13 8 1 azure/ azure-data-lake/ azure-synapse/ delta-lake/ azure-data-lake-gen2

I'm building a lakehouse architecture in Azure Synapse and am in doubt between using Delta-lake or a Lake database.

Both seem to have roughly the same functionality - I can use Spark to do ETL tasks - and then use spark pools as well as serverless sql pools to query data.

In Azure documentation , a lake database is defined as:

"A lake database provides a relational metadata layer over one or more files in a data lake. You can create a lake database that includes definitions for tables, including column names and data types as well as relationships between primary and foreign key columns. The tables reference files in the data lake, enabling you to apply relational semantics to working with the data and querying it using SQL. However, the storage of the data files is decoupled from the database schema; enabling more flexibility than a relational database system typically offers."

Whereas Delta Lake is defined as:

Delta Lake is an open-source storage layer that adds relational database semantics to Spark-based data lake processing. Delta Lake is supported in Azure Synapse Analytics Spark pools for PySpark, Scala, and .NET code.

The benefits of using Delta Lake in a Synapse Analytics Spark pool include:

Relational tables that support querying and data modification. With Delta Lake, you can store data in tables that support CRUD (create, read, update, and delete) operations. In other words, you can select, insert, update, and delete rows of data in the same way you would in a relational database system.

What are the differences between Delta lake and Lake Database (if any) in Azure Synapse? Or are they simply two different tools to achieve roughly the same results? Are there concrete benefits of using one over the other?

1 answers

The Lake Database is a facility that Microsoft added to Synapse Analytics that uses Spark SQL (Hive) managed tables to provide the database abstraction layer to your Parquet, csv or Delta tables. It uses the Hive Metastore, which keeps track of database contents: tables, schemas, views, etc. If you use Delta tables in it, you will have all the additional metadata that is part of the change tracking of Delta Lake, but Delta table metadata is not part of the Lake Database Metastore. I am using the free Linux distribution of Delta Lake.

If you configure your Delta Lake properly, you can get it to appear in Synapse Studio as a Lake Database. One advantage of the Lake Database is that in Synapse data flows, instead of Integration Dataset you can use the Workspace DB source type, which is for the Lake Databases and it uses the database and table model instead of working with a bunch of integration datasets that you have to define.

I am in the process of setting this up for a client and still discovering the details. Documentation is plentiful for the different pieces, but nothing exists for the whole, how to configure it, and how it all it works together. So please excuse any inaccurate statements here. There are many nuances to know in order to integrate the open-source Delta Lake into the Lake Database, and Synapse pipelines. What you get with this stack should be similar to what you get in the Databricks version of Delta Lake, except here the configuration is all on you and you have to have some luck figuring it out.

How to create view in Azure Synapse Serverless pool for a Lake Database

Dataverse to Synapse Link - missing Lake database in Workspace

Synapse Lake database view not available in SQL Pool?

Can we join tables in on-premise SQL Server database to tables in Delta tables in Azure Delta lake? what are my options

How many versions are created in a delta table in a Data lake on Azure

get all the contents of data lake gen2 folder in a list azure synapse workspace

How To Read XML File from Azure Data Lake In Synapse Notebook without Using Spark

Migrating mongo atlas database to Azure data lake storage

Azure Purview scan on Databrick delta-lake shows "Error: (3913) JavaException: Must have Java 8 or newer installed"

copy data from on premise sql server to delta format in Azure Data Lake Storage Gen2

暂无

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

Related Question How to create view in Azure Synapse Serverless pool for a Lake Database Dataverse to Synapse Link - missing Lake database in Workspace Synapse Lake database view not available in SQL Pool? Can we join tables in on-premise SQL Server database to tables in Delta tables in Azure Delta lake? what are my options How many versions are created in a delta table in a Data lake on Azure get all the contents of data lake gen2 folder in a list azure synapse workspace How To Read XML File from Azure Data Lake In Synapse Notebook without Using Spark Migrating mongo atlas database to Azure data lake storage Azure Purview scan on Databrick delta-lake shows "Error: (3913) JavaException: Must have Java 8 or newer installed" copy data from on premise sql server to delta format in Azure Data Lake Storage Gen2

Related Tags

Difference between Delta Lake and Lake Database in Azure Synapse

Question

1 answers

solution1 1 2023-01-27 16:23:52

solution1
1 2023-01-27 16:23:52