简体   繁体   English

数据库体系结构,列中的大多数空字段

[英]Database architecture, majority null fields in a column

I have a database table in MySQL, according to a new feature, we can implement in two ways - 1. Either make a new column (nullable) in the same table itself, the con of this approach is - this column will have 95-98% of the times NULL entry. 我在MySQL中有一个数据库表,根据一项新功能,我们可以通过两种方式实现-1.在同一个表本身中创建一个新列(可为空),这种方法的缺点是-该列将具有95- 98%的时间为NULL条目。 2. Make a new table with the foreign key of the already existing table. 2.使用现有表的外键创建一个新表。

so the two architecture will look something like this - 所以这两种架构看起来像这样-

1. table1 - <id, ..., new_column>

2. table1 - <id, ...>, table2 - <id, table1_id, ...>

The first approach follows a denormalized approach, while the second one follows a normalized one. 第一种方法遵循非规范化方法,而第二种方法遵循规范化方法。 But since this is a real-world problem, it is okay to follow the denormalized approach sometimes. 但是由于这是一个现实问题,所以有时可以采用非规范化方法。

I might be wrong in some of my assumptions of DB design, what do you think is a better approach to solve such kind of problems? 我对数据库设计的某些假设可能是错误的,您认为解决此类问题的更好方法是什么?

It would be really helpful if you could provide specific examples - "should I add a column that may be null" isn't easy to answer. 如果您可以提供特定的示例,这将非常有帮助-“我应该添加一个可能为空的列”不容易回答。

In very general terms, normalize until you can prove you have to do something else. 非常笼统,规范,直到你能证明你做别的事情。 Design your database for legibility and bug-resistance; 设计数据库的易读性和防错性; adding an extra table is much less effort than working out why on earth your application suddenly reports incorrect data in 12 months when you change a bit of code that accidentally forgets about your denormalization. 添加额外的表所花的精力要比弄清楚为什么为什么当您更改一些意外忘记了非规范化的代码时,您的应用程序突然在12个月内报告不正确的数据。

So, is this nullable column an attribute of the entity? 那么,此可空列是否是实体的属性? Not all people have a middle name attribute - perfectly reasonable to have a nullable column. 不是所有的people有一个middle name属性-完全合理的有一个空列。 Or is it something that you're just attaching to the entity because it's convenient, but isn't really an attribute? 还是因为方便而只是将其附加到实体上,但实际上不是属性吗?

For instance, a person may have an employer , and that employer may have an address ; 例如,一个person可能有一个employer ,而该雇主可能有一个address ideally, you'd create an employer table, with an address attribute; 理想情况下,您将创建一个带有address属性的employer表; attaching employer_address to person might feel like a shortcut (I don't care about anything other than the address - I never need to know how many people work for that employer). 连接employer_address到人可能会觉得自己像一个快捷键(我不关心除地址以外的任何-我永远不需要知道有多少人为该雇主工作)。

This may feel like you're saving yourself some effort - but it's less legible (so future developers will wonder why you did this), more bug prone (you may get incorrect or inconsistent addresses for a single employer), and harder to change in the future (good luck working out how many people work for a given employer just based on the address). 这可能看起来像是您正在省力省力-但它的可读性较差(因此将来的开发人员会想知道您为什么这样做),易发生错误(您可能会为单个雇主获得不正确或不一致的地址),并且更难更改未来(祝您好运,仅根据地址确定有多少人为给定的雇主工作)。

"Vertical Partitioning" can be advantageous in these cases 在这些情况下,“垂直分区”可能是有利的

  • The column(s) in the second table are usually missing, so that table has fewer rows. 通常,第二个表中的列会丢失,因此该表中的行较少。 Note: you can get NULL by using LEFT JOIN . 注意:您可以使用LEFT JOIN获得NULL
  • The column(s) in the second table are bulky, but rarely used. 第二个表中的列很大,但很少使用。 There are performance disadvantages when doing SELECT * and some of the columns are TEXT / BLOB . 进行SELECT *时有性能上的劣势,有些列是TEXT / BLOB The Vertical Partitioning may help you with speed. 垂直分区可以帮助您提高速度。 (Picking an appropriate ROW_FORMAT in InnoDB virtually eliminates this advantage.) (在InnoDB中选择适当的ROW_FORMAT实际上消除了这一优势。)
  • The most common queries do not need the column(s) of the second table. 最常见的查询不需要第二个表的列。
  • You gotta add the column with no downtime. 无需停机添加列。 An ALTER .. ADD COLUMN .. on the main table, depending on the MySQL/MariaDB version, may block usage of it for a long time. 根据MySQL / MariaDB版本,主表上的ALTER .. ADD COLUMN ..可能会长时间阻止其使用。

I suspect that only 1 table in 100 should be split this way. 我怀疑这种方式只能拆分100个表格中的1个。 It is confusing to readers, etc. The benefits I list above are rare, and the benefits may not justify the effort. 这会使读者感到困惑,等等。我上面列出的好处很少,并且这些好处可能不足以证明这样做是合理的。

The second table would have the same PRIMARY KEY as the main table, but without AUTO_INCREMENT . 第二个表将具有与主表相同的PRIMARY KEY ,但没有AUTO_INCREMENT The two tables would not have the same secondary keys. 这两个表将没有相同的辅助键。 And note that you cannot have a composite index with columns from both tables. 并请注意,您不能在两个表中都有包含列的复合索引。

If the new column(s) are a bunch of "attributes" such as in a 'store' app, consider throwing them in a JSON column. 如果新列是一堆“属性”,例如在“商店”应用程序中,请考虑将它们放入JSON列中。 This is open ended, but clumsy to use with WHERE or ORDER BY . 这是开放式的,但是很难与WHEREORDER BY一起使用。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM