简体   繁体   English

如何从 Azure 数据湖转换,数据按日期文件夹分区到增量湖

[英]How can I transition from Azure Data Lake, with data partitioned by date folders into delta lake

I own an azure data lake gen2 with data partitioned by datetime nested folders.我拥有一个 azure 数据湖 gen2,其数据按日期时间嵌套文件夹进行分区。

I want to provide delta lake format to my team but I am not sure if I should create a new storage account an copy the data into delta format or if it would be best practice to transform the current azure data lake into a delta lake format.我想为我的团队提供 delta Lake 格式,但我不确定是否应该创建一个新的存储帐户并将数据复制为 delta 格式,或者将当前的 azure 数据湖转换为 delta Lake 格式是否是最佳实践。

Could anyone provide any tips on this matter?有人可以就此事提供任何提示吗?

AFAIK , Delta format is supported only as inline dataset and only in Data flows, we can have inline datasets. AFAIK ,Delta 格式仅支持作为内联数据集,并且仅在数据流中,我们可以拥有内联数据集。

So, my suggestion is to use Data flows for this.所以,我的建议是为此使用数据流。
As you have the data in date time nested folders, I reproduced with sample dates like below.由于您在日期时间嵌套文件夹中有数据,因此我使用如下示例日期进行了复制。 I have uploaded a sample csv file in each folder 10 and 9.我在每个文件夹 10 和 9 中上传了一个示例 csv 文件。

在此处输入图像描述

Create a data flow in ADF and in source select inline dataset to give the wild card path we want.在 ADF 和源 select 内联数据集中创建一个数据流,以提供我们想要的通配符路径。 Select your data format, here Delimited text for me. Select 你的数据格式,这里给我分隔文本。 give the linked service as well.也提供链接服务。

在此处输入图像描述

Assuming that your nested folder structure is same for all files, give the wild card path like below as per your path level.假设所有文件的嵌套文件夹结构都相同,请根据您的路径级别提供如下通配符路径。

在此处输入图像描述

Now, create delta format sink like below.现在,创建如下所示的增量格式接收器。

在此处输入图像描述

give the linked service as well.也提供链接服务。
In the sink settings give the folder for your delta files and Update method.在接收器设置中,为您的增量文件和更新方法提供文件夹。

在此处输入图像描述

You can see the delta format files were created in the Folder path after execution.您可以看到执行后在文件夹路径中创建了增量格式文件。

在此处输入图像描述

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 Azure:如何从数据湖复制分区数据 - Azure:How to copy partitioned data from from data lake 作为软件测试人员,如何测试蔚蓝数据湖中的数据? - As a software tester, how can I test the data in azure data lake? 从 Excel 管理 Azure Data Lake 中文件夹的授权 - Manage Authorization To folders in Azure Data Lake from Excel 如何将文件从多个源文件夹复制到 azure 数据湖存储第 2 代中的目标文件夹 - How to copy files from multiple source folders to target folders in azure data lake storage gen 2 如何将数据(文件夹)从本地专用网络上的服务器迁移到 azure 数据湖? - How to migrate data (folders) from a server on a local private network to azure data lake? 如何使用 python 从 Azure Data Lake Gen 2 读取文件 - How can i read a file from Azure Data Lake Gen 2 using python 如何将 .parquet 文件从本地计算机上传到 Azure Storage Data Lake Gen2? - How can I upload a .parquet file from my local machine to Azure Storage Data Lake Gen2? 如何从 .NET 在 Azure Data Lake 文件共享中创建文件? - How can I create files in Azure Data Lake File Shares from .NET? 我们如何复制Azure Data Lake Store文件夹中的任何文件 - How we can copy any file within Azure Data Lake Store folders Azure Data Lake软件包 - Azure Data lake packages
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM