[英]Database architecture, majority null fields in a column
I have a database table in MySQL, according to a new feature, we can implement in two ways - 1. Either make a new column (nullable) in the same table itself, the con of this approach is - this column will have 95-98% of the times NULL entry. 我在MySQL中有一个数据库表,根据一项新功能,我们可以通过两种方式实现-1.在同一个表本身中创建一个新列(可为空),这种方法的缺点是-该列将具有95- 98%的时间为NULL条目。 2. Make a new table with the foreign key of the already existing table.
2.使用现有表的外键创建一个新表。
so the two architecture will look something like this - 所以这两种架构看起来像这样-
1. table1 - <id, ..., new_column>
2. table1 - <id, ...>, table2 - <id, table1_id, ...>
The first approach follows a denormalized approach, while the second one follows a normalized one. 第一种方法遵循非规范化方法,而第二种方法遵循规范化方法。 But since this is a real-world problem, it is okay to follow the denormalized approach sometimes.
但是由于这是一个现实问题,所以有时可以采用非规范化方法。
I might be wrong in some of my assumptions of DB design, what do you think is a better approach to solve such kind of problems? 我对数据库设计的某些假设可能是错误的,您认为解决此类问题的更好方法是什么?
It would be really helpful if you could provide specific examples - "should I add a column that may be null" isn't easy to answer. 如果您可以提供特定的示例,这将非常有帮助-“我应该添加一个可能为空的列”不容易回答。
In very general terms, normalize until you can prove you have to do something else. 非常笼统,规范,直到你能证明你做别的事情。 Design your database for legibility and bug-resistance;
设计数据库的易读性和防错性; adding an extra table is much less effort than working out why on earth your application suddenly reports incorrect data in 12 months when you change a bit of code that accidentally forgets about your denormalization.
添加额外的表所花的精力要比弄清楚为什么为什么当您更改一些意外忘记了非规范化的代码时,您的应用程序突然在12个月内报告不正确的数据。
So, is this nullable column an attribute of the entity? 那么,此可空列是否是实体的属性? Not all
people
have a middle name
attribute - perfectly reasonable to have a nullable column. 不是所有的
people
有一个middle name
属性-完全合理的有一个空列。 Or is it something that you're just attaching to the entity because it's convenient, but isn't really an attribute? 还是因为方便而只是将其附加到实体上,但实际上不是属性吗?
For instance, a person
may have an employer
, and that employer may have an address
; 例如,一个
person
可能有一个employer
,而该雇主可能有一个address
; ideally, you'd create an employer
table, with an address
attribute; 理想情况下,您将创建一个带有
address
属性的employer
表; attaching employer_address
to person might feel like a shortcut (I don't care about anything other than the address - I never need to know how many people work for that employer). 连接
employer_address
到人可能会觉得自己像一个快捷键(我不关心除地址以外的任何-我永远不需要知道有多少人为该雇主工作)。
This may feel like you're saving yourself some effort - but it's less legible (so future developers will wonder why you did this), more bug prone (you may get incorrect or inconsistent addresses for a single employer), and harder to change in the future (good luck working out how many people work for a given employer just based on the address). 这可能看起来像是您正在省力省力-但它的可读性较差(因此将来的开发人员会想知道您为什么这样做),易发生错误(您可能会为单个雇主获得不正确或不一致的地址),并且更难更改未来(祝您好运,仅根据地址确定有多少人为给定的雇主工作)。
"Vertical Partitioning" can be advantageous in these cases 在这些情况下,“垂直分区”可能是有利的
NULL
by using LEFT JOIN
. LEFT JOIN
获得NULL
。 SELECT *
and some of the columns are TEXT
/ BLOB
. SELECT *
时有性能上的劣势,有些列是TEXT
/ BLOB
。 The Vertical Partitioning may help you with speed. ROW_FORMAT
in InnoDB virtually eliminates this advantage.) ROW_FORMAT
实际上消除了这一优势。) ALTER .. ADD COLUMN ..
on the main table, depending on the MySQL/MariaDB version, may block usage of it for a long time. ALTER .. ADD COLUMN ..
可能会长时间阻止其使用。 I suspect that only 1 table in 100 should be split this way. 我怀疑这种方式只能拆分100个表格中的1个。 It is confusing to readers, etc. The benefits I list above are rare, and the benefits may not justify the effort.
这会使读者感到困惑,等等。我上面列出的好处很少,并且这些好处可能不足以证明这样做是合理的。
The second table would have the same PRIMARY KEY
as the main table, but without AUTO_INCREMENT
. 第二个表将具有与主表相同的
PRIMARY KEY
,但没有AUTO_INCREMENT
。 The two tables would not have the same secondary keys. 这两个表将没有相同的辅助键。 And note that you cannot have a composite index with columns from both tables.
并请注意,您不能在两个表中都有包含列的复合索引。
If the new column(s) are a bunch of "attributes" such as in a 'store' app, consider throwing them in a JSON
column. 如果新列是一堆“属性”,例如在“商店”应用程序中,请考虑将它们放入
JSON
列中。 This is open ended, but clumsy to use with WHERE
or ORDER BY
. 这是开放式的,但是很难与
WHERE
或ORDER BY
一起使用。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.