简体   繁体   English

Azure 突触 (DWH) 中的关系

[英]Relationships in Azure synapse (DWH)

I'm currently working in Azure synapse DWH and I have some theoretical questions:我目前在 Azure 突触 DWH 工作,我有一些理论问题:

How I can create relationships between tables (Dim's and Fact's) and what implications I would have If I want to create those relationships.我如何在表(Dim 和 Fact)之间创建关系以及如果我想创建这些关系会产生什么影响。

I read that To create a primary key, I would need to set a nonclustered table, but what that means?我读到要创建主键,我需要设置一个非聚集表,但这意味着什么?

Azure Synapse Analytics (ASA) has three engines: Azure Synapse Analytics (ASA) 具有三个引擎:

  • serverless SQL pools (was SQL on-demand)无服务器 SQL 池(按需提供 SQL)
  • dedicated SQL pools (the next step on from Azure SQL Data Warehouse)专用 SQL 池(从 Azure SQL 数据仓库开始)
  • Apache Spark pools Apache 火花池

None of these currently support database relationships, as at today.目前这些都不支持数据库关系,就像今天一样。 I suspect you mean dedicated SQL pools and just to confirm it does not support the FOREIGN KEY syntax.我怀疑您的意思是专用 SQL 池,只是为了确认它不支持FOREIGN KEY语法。 Relationships is more of an OLTP concept and not common in big data platforms, which ASA is.关系更像是一个 OLTP 概念,在 ASA 的大数据平台中并不常见。

Therefore your options are to enforce these relationships downstream or on import to your warehouse.因此,您的选择是在下游或进口到您的仓库时强制执行这些关系。 A common method is to identify unknown values and substitute them with a -1 / Unknown value on import.一种常见的方法是识别未知值并在导入时用 -1 / 未知值替换它们。 This will ensure there are no NULLs in your key columns.这将确保您的键列中没有 NULL。

Additionally, enforce your relationships downstream eg in an Azure Analysis Services tabular model or Power BI model.此外,在 Azure Analysis Services 表格 model 或 Power BI model 等下游强制实施您的关系。

If you really need relationships then depending on your data volumes you might consider Azure SQL Database which supports data volumes up to 4TB alongside columnstore indexes which give great compression.如果您确实需要关系,那么根据您的数据量,您可以考虑 Azure SQL 数据库,它支持高达 4TB 的数据量以及提供出色压缩的列存储索引。

having a similar issue:有类似的问题:
I cannot find an automated solution thus far;到目前为止,我找不到自动化解决方案;

I'm importing 'entities' from D365 to datalake;我正在将“实体”从 D365 导入数据湖; and it does NOT come with the relationships.它不伴随关系。
it will also NOT suggest the "Related Tables"它也不会建议“相关表”

Introduce;介绍; ETL of 'entities' using T-SQL and Spark.使用 T-SQL 和 Spark 对“实体”进行 ETL。 Governance of:: py.spark, notebooks, Schema, linting T-SQL. py.spark、notebooks、Schema、linting T-SQL 的治理。 orchestration of activities and pipelines, workflows.活动和管道、工作流的编排。 Etc... ETC...

OR For small datasets and projects:或者对于小型数据集和项目:

  1. Reverse look-up each table needed.反向查找所需的每个表。
  2. In Azure Synapse create a new DataFlow;在 Azure Synapse 中创建一个新的 DataFlow; and download the.PBIX;并下载.PBIX;
  3. Do your ETL: Create Primary fact and dimension tables;执行 ETL:创建主要事实和维度表; (by whatever means), such as Using PowerPivot Unique/distinct DAX expression on a Customer.Table). (通过任何方式),例如在 Customer.Table 上使用 PowerPivot Unique/distinct DAX 表达式)。
  4. Once complete;一旦完成; if you like;如果你喜欢; import the newly ETL primary tables to the datalake.将新的 ETL 主表导入数据湖。
  5. Repeat step 2.重复步骤 2。
  6. Create the relationships with PowerBI.创建与 PowerBI 的关系。 (Ideally if ETL is done correctly PBI will auto find the relationships) (理想情况下,如果 ETL 正确完成,PBI 将自动查找关系)
  7. RE-Publish the.PBIX with the relationships as a “DataFlow”.将关系重新发布为“DataFlow”的.PBIX。 a.一个。 You must create relationships for every Dataflow;您必须为每个数据流创建关系; dataflows cannot be combined.无法合并数据流。 Measures and Dataflows will consume resources and require performance analysis if they grow.度量和数据流将消耗资源并在增长时需要进行性能分析。

at some point 'dataverse' may allow D365 data making this easier.在某些时候,'dataverse' 可能允许 D365 数据使这更容易。

depending on your 'cost/spend' cloning all of D365 still doesn't solve your relationship needs.根据您的“成本/支出”克隆所有 D365 仍然不能解决您的关系需求。

Two solutions I'm aware of thus far:到目前为止我知道的两个解决方案:

  1. Import the serverless DBO's to PowerBI;将无服务器 DBO 导入 PowerBI; Model and Create the Dataset there. Model 并在那里创建数据集。 you can do massive ETL, including Foreign Key creation, and Filtering of NULL values to create primary keys for Dimensions.您可以进行海量 ETL,包括创建外键和过滤 NULL 值以创建维度的主键。 Aggregate data and create Fact tables, etc... Its far easier then using the Synapse GUI.聚合数据并创建事实表等...比使用 Synapse GUI 容易得多。 Drawbacks are PBI licensing related.缺点与 PBI 许可相关。

  2. Create a "Lake Database" (map as you go, great for 5 or less entities.tables.) ETL is low-code.创建一个“湖数据库”(映射为 go,非常适合 5 个或更少的entities.tables。)ETL 是低代码。 But I'm skeptical that after 40 hours of training;但经过 40 小时的训练,我对此表示怀疑; I should have just learned how to scrip this in Workbook/Spark.我应该刚刚学会了如何在 Workbook/Spark 中编写这个脚本。

  3. Do BOTH;两者都做; use PowerBI to develop your model and test it.使用 PowerBI 开发您的 model 并进行测试。 Then go back to synapse and deploy the working model as a pipeline or lake database.然后 go 返回突触并将工作的 model 部署为管道或湖数据库。

Points of Clarity from the top posted solution:顶部发布的解决方案的明确点:

  1. Do not trust the auto-relationship of PowerBI;不要相信 PowerBI 的自关系; stay away from pre-made REFID relationships in PBI unless you know for sure this is what you want.远离 PBI 中预先建立的 REFID 关系,除非您确定这是您想要的。 (step 6: original poster; if ETL is correct its a 1:M) (第 6 步:原始海报;如果 ETL 正确,则为 1:M)

  2. Publishing with.PBIX has its limitations with sharing and other issues the OP mentioned.使用 .PBIX 发布在共享和 OP 提到的其他问题方面存在局限性。 Lake Database might be the workaround if you have Tabelau, Python, or Qlik as your solution.如果您有 Tabelau、Python 或 Qlik 作为解决方案,Lake Database 可能是解决方法。

  3. DataVerse is coming; DataVerse 即将到来; and PBI Analytics as well as predictive analysis with HD Insights will be embedded into D365. PBI Analytics 以及 HD Insights 的预测分析将嵌入到 D365 中。 You will also be able to create drag and drop dashboards.您还可以创建拖放仪表板。 As of 08-05-2022 this is already working in its infancy;截至 2022 年 8 月 5 日,这已经处于起步阶段; even thought they want you to go modular;甚至以为他们要你 go 模块化; with hybrid serverless setup you can STILL Pull the aggregate measures from D365 into synapse and Reverse engineer them.使用混合无服务器设置,您仍然可以将 D365 中的聚合度量拉入突触并对其进行逆向工程。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM