简体繁体 English

如何处理或架构Azure数据湖存储中的增量数据提取？

[英]How to Handle or Architecture, incremental data ingestion in Azure data lake Store?

原文 2017-04-19 14:23:51 7 1 azure/ azure-sql-database/ azure-data-lake

I've Two Custom code dll, for Image related to IP Cams. 我有两个用于与IP摄像机相关的图像的自定义代码dll。

dll-One : Extract image from IP cams and can be stored it to Azure data lake Store. dll-One ：从IP摄像机提取图像，并将其存储到Azure数据湖存储中。

Like : 像：

/adls/clinic1/patientimages / adls / clinic1 / patientimages
/adls/clinic2/patientimages / adls / clinic2 / patientimages

dll-two : Use those image and extract information from it and load data into RDBMS tables. dll-two ：使用这些图像并从中提取信息，并将数据加载到RDBMS表中。

So for instance in RDBMS ,say there are entities dimpatient, dimclinic and factpatientVisit. 因此，例如，在RDBMS中，假设存在实体暗患者，暗诊所和事实患者访问。

For start, a one time data can be exported to defined location in Azure data lake store. 首先，可以将一次性数据导出到Azure数据湖存储中的定义位置。

Like: 喜欢：

/adls/dimpatient / adls / dim Patient
/adls/dimclinic / adls / dimclinic
/adls/factpatientVisit / adls / fact PatientVisit

Question : How to push incremental data in same file or how we can handle this incremental load in Azure data Analytics? 问题：如何在同一文件中推送增量数据，或者如何处理Azure数据分析中的增量负载？

This like implementing Warehouse in Azure Data Analytics. 这就像在Azure数据分析中实现仓库一样。

Note : Azure SQL db or any other storage offered by Azure is not want to. 注意：不想使用Azure SQL数据库或Azure提供的任何其他存储。 I mean why to spend in other Azure Services if one type of storage has capabilities to hold all types of data. 我的意思是，如果一种类型的存储具有保存所有类型的数据的功能，为什么要花其他的Azure服务。

adls is name of my ADLS storage. adls是我的ADLS存储的名称。

1 个解决方案

I am not sure I completely understand your question, but you can organize your data files in Azure Data Lake Store or your rows in partitioned U-SQL tables along a time dimension, so you can add new partitions/files for each increment. 我不确定我是否完全理解您的问题，但是您可以沿时间维度组织Azure Data Lake Store中的数据文件或分区的U-SQL表中的行，以便可以为每个增量添加新的分区/文件。 In general, we recommend that such increments are of substantial sizes though to preserve the ability to scale. 通常，我们建议这种增量应有足够的大小，但要保留扩展的能力。