简体   繁体   English

对数据集市的外键列具有NULL是否有任何性能影响

[英]Is there any performance impact on having NULLs on Foreign key column in a Data mart

We are currently working on Data mart design. 我们目前正在致力于数据集市的设计。 We are having many Foreign keys to dimension tables. 我们有许多维度表的外键 We are thinking whether to allow NULL in Foreign key dimension fields or have -1 to represent NULL values. 我们正在考虑是NULL键维字段中允许NULL还是让-1代表NULL值。

Kimball suggests to keep default row for NULL values. Kimball建议为NULL值保留默认行。 http://www.kimballgroup.com/data-warehouse-business-intelligence-resources/kimball-techniques/dimensional-modeling-techniques/fact-table-null/ http://www.kimballgroup.com/data-warehouse-business-intelligence-resources/kimball-techniques/Dimension-modeling-techniques/fact-table-null/

My lead suggests to keep NULL as NULL . 我的领导建议将NULL保留为NULL

Will there be any performance impact for keeping NULL in Foreign key fields ? 在外键字段中保持NULL会对性能产生影响吗?

Kimball is right (as he usually is). Kimball是对的(像往常一样)。 Use a default value where you would use NULL . 在使用NULL地方使用默认值。

Why? 为什么? It ensures that joins to the dimensions will not "accidentally" filter rows. 它确保连接到维度不会“意外”过滤行。 Trying to reconcile results from different queries eats up a lot of time. 尝试协调来自不同查询的结果会占用大量时间。 Ensuring that joins succeed is one method of reducing such discrepancies. 确保连接成功是减少此类差异的一种方法。

If you are not going to follow his advice, then store using NULL . 如果您不遵循他的建议,请使用NULL进行存储。 A value such as -1 is particularly bad -- because it prevents the database from enforcing foreign key constraints. 诸如-1值特别糟糕-因为它阻止数据库强制执行外键约束。

Another reason to avoid using NULL that Gordon hasn't covered: it's unclear what NULL means. 戈登还没有介绍避免使用NULL的另一个原因:不清楚NULL的含义。

Sometimes you have a NULL in a data mart or data warehouse because something has gone wrong in the ETL or in a source system, leading to a NULL. 有时,由于ETL或源系统中出现了某些问题而导致NULL,因此数据集市或数据仓库中存在NULL。 Other times you have a NULL because that column doesn't apply to that particular row. 有时候,您使用NULL,因为该列不适用于该特定行。 Or in the case of something like an accumulating snapshot table, because that column has not been populated yet, as the process being reported on hasn't yet reached the point where that column will be populated. 或在诸如累积快照表之类的情况下,因为尚未填充该列,因为所报告的进程尚未到达将要填充该列的地步。

Rather than a single default value I like to set up multiple; 我不想设置多个默认值,而不是设置一个默认值。 for instance, you can set up every dimension to have a row that indicates "Unknown" which you might use for missing values, and a row that indicates "N/A" for cases where the value does not apply. 例如,您可以将每个维度设置为具有一个指示“未知”的行(可能会用于缺失值),以及一个指示“不适用”的行(如果该值不适用)。 I tend to set these up with negative integers for keys (-1 is Unknown, -2 is N/A, etc.), as that allows me to use the same keys for these rows in every table. 我倾向于使用负整数来设置键(-1是Unknown,-2是N / A,等等),因为这允许我为每个表中的这些行使用相同的键。 But as both Kimball and Gordon indicate, you should actually create those rows in your dimensions. 但是,正如Kimball和Gordon所指出的那样,您实际上应该在维度中创建这些行。

This makes it really easy to run data quality checks looking for cases where something has gone wrong. 这使得运行数据质量检查以查找出现问题的情况变得非常容易。 It means you can display some meaningful values in any reporting or analysis tools so people can filter out rows that haven't fully populated if they want to, or so your data stewards can look for problematic data via those tools. 这意味着您可以在任何报告或分析工具中显示一些有意义的值,以便人们可以过滤掉尚未完全填充的行,或者您的数据管理员可以通过那些工具查找有问题的数据。 Or perhaps people might want to specifically look for those rows where one of the dimensions isn't applicable. 也许人们可能想专门寻找其中某一维度不适用的那些行。

If you have a situation where data sometimes loads in the "wrong" order (ie a fact table gets populated, but relevant dimension members haven't been added a dimension yet), you can also use this to check for rows that need updating in your ETL and automate fixing the issue, without repeatedly trying to update those rows that do not need updating because they will always have a NULL. 如果您遇到数据有时以“错误”顺序加载的情况(即填充了事实表,但尚未向相关维成员添加维),则也可以使用它来检查需要更新的行您的ETL并自动解决此问题,而无需反复尝试更新那些不需要更新的行,因为它们将始终为NULL。

And down the line when someone else takes over support of this data mart, they'll be really thankful when they don't have to spend a huge amount of time unpicking whether those NULLs or -1s indicate a problem. 当其他人接管该数据集市的支持时,当他们不必花费大量时间来选择这些NULL或-1是否表示问题时,他们将非常感激。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 SQL Server表 - (或可能是任何SQL表)没有主键影响性能? - SQL Server table - ( or likely any SQL table) Does not having a primary key impact performance? 数据集市设计-最佳实践-为什么不使用外键? - Data Mart design - Best practice - Why are foreign keys not used? 如何创建可以提高图表性能的聚合表(数据集市)? - How to create an aggregate table (data mart) that will improve chart performance? 外键约束是否会自动禁止空值? - Does a foreign key constraint automatically disallow nulls? Peewee,Python,获取列外键中的数据 - Peewee, Python, get data in column foreign key 如何提取一列数据并替换为外键? - How to extract a column of data and replace with foreign key? 关于性能谓词列之间的任何差异都不是“主键列” - Any Difference between Regarding Performance predicate Column is Not a "Primary Key Column" 在具有主键和外键的两个表中插入数据 - Inserting data in two tables having primary and foreign key 尽管列具有不同的数据类型,但外键被拒绝 - Foreign key rejected despite columns having different data types 无法使用mysql将数据插入具有外键的表中 - Not able to insert data in a table having foreign key using mysql
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM