Database architecture, majority null fields in a column

Question

I have a database table in MySQL, according to a new feature, we can implement in two ways - 1. Either make a new column (nullable) in the same table itself, the con of this approach is - this column will have 95-98% of the times NULL entry. 2. Make a new table with the foreign key of the already existing table.

so the two architecture will look something like this -

1. table1 - <id, ..., new_column>

2. table1 - <id, ...>, table2 - <id, table1_id, ...>

The first approach follows a denormalized approach, while the second one follows a normalized one. But since this is a real-world problem, it is okay to follow the denormalized approach sometimes.

I might be wrong in some of my assumptions of DB design, what do you think is a better approach to solve such kind of problems?

Answer 1

It would be really helpful if you could provide specific examples - "should I add a column that may be null" isn't easy to answer.

In very general terms, normalize until you can prove you have to do something else. Design your database for legibility and bug-resistance; adding an extra table is much less effort than working out why on earth your application suddenly reports incorrect data in 12 months when you change a bit of code that accidentally forgets about your denormalization.

So, is this nullable column an attribute of the entity? Not all people have a middle name attribute - perfectly reasonable to have a nullable column. Or is it something that you're just attaching to the entity because it's convenient, but isn't really an attribute?

For instance, a person may have an employer , and that employer may have an address ; ideally, you'd create an employer table, with an address attribute; attaching employer_address to person might feel like a shortcut (I don't care about anything other than the address - I never need to know how many people work for that employer).

This may feel like you're saving yourself some effort - but it's less legible (so future developers will wonder why you did this), more bug prone (you may get incorrect or inconsistent addresses for a single employer), and harder to change in the future (good luck working out how many people work for a given employer just based on the address).

Answer 2

"Vertical Partitioning" can be advantageous in these cases

The column(s) in the second table are usually missing, so that table has fewer rows. Note: you can get NULL by using LEFT JOIN .
The column(s) in the second table are bulky, but rarely used. There are performance disadvantages when doing SELECT * and some of the columns are TEXT / BLOB . The Vertical Partitioning may help you with speed. (Picking an appropriate ROW_FORMAT in InnoDB virtually eliminates this advantage.)
The most common queries do not need the column(s) of the second table.
You gotta add the column with no downtime. An ALTER .. ADD COLUMN .. on the main table, depending on the MySQL/MariaDB version, may block usage of it for a long time.

I suspect that only 1 table in 100 should be split this way. It is confusing to readers, etc. The benefits I list above are rare, and the benefits may not justify the effort.

The second table would have the same PRIMARY KEY as the main table, but without AUTO_INCREMENT . The two tables would not have the same secondary keys. And note that you cannot have a composite index with columns from both tables.

If the new column(s) are a bunch of "attributes" such as in a 'store' app, consider throwing them in a JSON column. This is open ended, but clumsy to use with WHERE or ORDER BY .

Database architecture, majority null fields in a column

Question

2 answers

solution1
1 2018-08-22 10:19:30

solution2
0 2018-08-28 02:15:57

Database architecture, majority null fields in a column

Question

2 answers

solution1 1 2018-08-22 10:19:30

solution2 0 2018-08-28 02:15:57

solution1
1 2018-08-22 10:19:30

solution2
0 2018-08-28 02:15:57