简体   繁体   English

使用Azure Data Lake时是否需要Data Warehouse?

[英]Is there any need of Data Warehouse when using Azure Data Lake?

I am exploring Azure Data Lake and I am new to this field.我正在探索 Azure Data Lake,我是这个领域的新手。 I explored many things and read many articles.我探索了很多东西,阅读了很多文章。 Basically I have to develop Power BI dashboard from data of different sources.基本上我必须从不同来源的数据开发Power BI 仪表板

In classic SQL Server stack I can write an ETL (Extract, Transform, Load) process to bring the data from my system databases into the Data Warehouse database.在经典的 SQL Server 堆栈中,我可以编写一个 ETL(提取、转换、加载)过程,将数据从我的系统数据库导入数据仓库数据库。 Then use that Data Warehouse with Power BI by using SSAS etc.然后通过使用 SSAS 等将该数据仓库与 Power BI 一起使用。

But I want to use Azure Data Lake and I explored Azure Data Lake Store and Azure Data Lake Analytic(U-SQL).但我想使用 Azure Data Lake,我探索了 Azure Data Lake Store 和 Azure Data Lake Analytic (U-SQL)。 I draw following architecture diagram.我画了下面的架构图。

在此处输入图像描述

  1. Is there any thing which I am missing in current flow of the application?我在当前的应用程序流程中遗漏了什么吗?
  2. I can get data directly from Azure Data Lake using Power BI so there is no need of Data Warehouse.我可以使用 Power BI 直接从 Azure Data Lake 获取数据,因此不需要数据仓库。 Am I right?我对吗?
  3. I can create a database in Azure Data Lake is that will be my Data Warehouse?我可以在 Azure Data Lake 中创建一个数据库,那将是我的数据仓库吗?
  4. What will be the best format for the Output data from Original file in Azure Data Lake egcsv? Azure Data Lake egcsv 中原始文件的输出数据的最佳格式是什么?

1 & 2) Currently ADLS only has limited support for allowing PowerBI to query directly over it. 1 & 2)目前 ADLS 仅有限支持允许 PowerBI 直接查询。 If your data is too large (greater than about 10GB I believe), then PowerBI cannot work directly over data in your ADLS account.如果您的数据太大(我相信大于 10GB),那么 PowerBI 无法直接处理您的 ADLS 帐户中的数据。 In this case, I would recommend either moving your processed data in ADLS to a SQL Database or SQL Data Warehouse, as this allows for PowerBI to operate over larger amounts of data.在这种情况下,我建议将您在 ADLS 中处理的数据移动到 SQL 数据库或 SQL 数据仓库,因为这允许 PowerBI 对更大量的数据进行操作。 You can use Azure Data Factory to move your data, or Polybase if moving data into SQL DW.您可以使用 Azure 数据工厂来移动数据,如果将数据移动到 SQL DW 中,则可以使用 Polybase。

3) A data lake is still distinct from a data warehouse, and they have separate strengths and weaknesses. 3)数据湖仍然不同于数据仓库,它们各有优缺点。 The data lake is best for storing your raw or slightly processed data, which may have a variety of formats and schemas.数据湖最适合存储原始数据或经过轻微处理的数据,这些数据可能具有多种格式和模式。 After you process and filter this data using Azure Data Lake Analytics, you can move that data into SQL DW for interactive analytics and data management (but at the cost of inflexibility of schema).使用 Azure Data Lake Analytics 处理和筛选此数据后,您可以将该数据移动到 SQL DW 中以进行交互式分析和数据管理(但代价是架构不灵活)。

4) Depends on your use case. 4)取决于您的用例。 If you plan on continuing to process the data in ADLS, I recommend you output into an ADLS table for greater performance.如果您计划继续在 ADLS 中处理数据,我建议您输出到 ADLS 表中以获得更高的性能。 However, if you need to pass this data into another service, then CSV is a good choice.但是,如果您需要将此数据传递到另一个服务中,那么 CSV 是一个不错的选择。 You can find more outputters on our GitHub such as JSON and XML.您可以在我们的GitHub 上找到更多输出器,例如 JSON 和 XML。

This answer may not be timely, but what I've tried that is more similar to your prior experience is spin up an instance of Azure Analysis Service.这个答案可能不及时,但我尝试过的与您之前的经验更相似的是启动 Azure Analysis Service 的实例。 You can create a tabular model or mdx model, shove a ton of data into memory and connect to it from power bi.您可以创建表格模型或 mdx 模型,将大量数据推入内存并从 power bi 连接到它。 The "only" catch is that it can get pricey quick. “唯一”的问题是它很快就会变得昂贵。 One great thing about AAS is that the interface to build a tabular model nearly follows power query and uses dax. AAS 的一大优点是构建表格模型的界面几乎遵循幂查询并使用 dax。

Also I believe these days adla store is basically gone in favor of using blob storage directly, so basically you'd go data --> blob --> dla --> aas --> pbi.我也相信现在 adla store 基本上不再支持直接使用 blob 存储,所以基本上你会去数据 --> blob --> dla --> aas --> pbi。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 Azure Data Lake - HDInsight与数据仓库 - Azure Data Lake - HDInsight vs Data Warehouse Azure 数据湖分析与 Azure SQL 数据仓库 - Azure Data Lake Analytics Vs Azure SQL Data Warehouse 使用Azure Data Lake进行分析 - using Azure Data Lake for Analytics 将增量数据加载到Azure Data Lake和Azure Data Warehouse中的最佳实践 - Best practices to implement incremental data load into azure data lake & azure data warehouse 需要将 Grafana 与 Azure 数据湖集成的解决方案 - Need solution to integrate Grafana with Azure data lake Azure SQL 数据仓库 Polybase 查询到 Azure Data Lake Gen 2 返回零行 - Azure SQL Data Warehouse Polybase Query to Azure Data Lake Gen 2 returns zero rows 将 Parquet 文件从 Azure 数据湖存储帐户复制到 Synapse 数据仓库表失败 - Copy parquet file from Azure data lake storage account to Synapse data warehouse table failed Azure SQL数据仓库 - Azure SQL Data Warehouse 是否有可能使用 Azure 数据工厂将数据从 Azure 数据湖 gen2 传输到 Azure 事件中心? - Is there any possibility to transfer data from Azure data lake gen2 to Azure event hub by using Azure data factory? 在Azure Functions中使用NodeJS将文件保存到Azure Data Lake Storage的任何示例吗? - Any samples using NodeJS in Azure Functions to save files to Azure Data Lake Storage?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM