[英]How can I transition from Azure Data Lake, with data partitioned by date folders into delta lake
I own an azure data lake gen2 with data partitioned by datetime nested folders.我拥有一个 azure 数据湖 gen2,其数据按日期时间嵌套文件夹进行分区。
I want to provide delta lake format to my team but I am not sure if I should create a new storage account an copy the data into delta format or if it would be best practice to transform the current azure data lake into a delta lake format.我想为我的团队提供 delta Lake 格式,但我不确定是否应该创建一个新的存储帐户并将数据复制为 delta 格式,或者将当前的 azure 数据湖转换为 delta Lake 格式是否是最佳实践。
Could anyone provide any tips on this matter?有人可以就此事提供任何提示吗?
AFAIK , Delta format is supported only as inline dataset and only in Data flows, we can have inline datasets. AFAIK ,Delta 格式仅支持作为内联数据集,并且仅在数据流中,我们可以拥有内联数据集。
So, my suggestion is to use Data flows for this.所以,我的建议是为此使用数据流。
As you have the data in date time nested folders, I reproduced with sample dates like below.由于您在日期时间嵌套文件夹中有数据,因此我使用如下示例日期进行了复制。 I have uploaded a sample csv file in each folder 10 and 9.我在每个文件夹 10 和 9 中上传了一个示例 csv 文件。
Create a data flow in ADF and in source select inline dataset to give the wild card path we want.在 ADF 和源 select 内联数据集中创建一个数据流,以提供我们想要的通配符路径。 Select your data format, here Delimited text for me. Select 你的数据格式,这里给我分隔文本。 give the linked service as well.也提供链接服务。
Assuming that your nested folder structure is same for all files, give the wild card path like below as per your path level.假设所有文件的嵌套文件夹结构都相同,请根据您的路径级别提供如下通配符路径。
Now, create delta format sink like below.现在,创建如下所示的增量格式接收器。
give the linked service as well.也提供链接服务。
In the sink settings give the folder for your delta files and Update method.在接收器设置中,为您的增量文件和更新方法提供文件夹。
You can see the delta format files were created in the Folder path after execution.您可以看到执行后在文件夹路径中创建了增量格式文件。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.