简体   繁体   中英

How can I transition from Azure Data Lake, with data partitioned by date folders into delta lake

I own an azure data lake gen2 with data partitioned by datetime nested folders.

I want to provide delta lake format to my team but I am not sure if I should create a new storage account an copy the data into delta format or if it would be best practice to transform the current azure data lake into a delta lake format.

Could anyone provide any tips on this matter?

AFAIK , Delta format is supported only as inline dataset and only in Data flows, we can have inline datasets.

So, my suggestion is to use Data flows for this.
As you have the data in date time nested folders, I reproduced with sample dates like below. I have uploaded a sample csv file in each folder 10 and 9.

在此处输入图像描述

Create a data flow in ADF and in source select inline dataset to give the wild card path we want. Select your data format, here Delimited text for me. give the linked service as well.

在此处输入图像描述

Assuming that your nested folder structure is same for all files, give the wild card path like below as per your path level.

在此处输入图像描述

Now, create delta format sink like below.

在此处输入图像描述

give the linked service as well.
In the sink settings give the folder for your delta files and Update method.

在此处输入图像描述

You can see the delta format files were created in the Folder path after execution.

在此处输入图像描述

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM