简体   繁体   中英

Database Schema - Create one large table or many?

I'm trying to decide on the best way to store entities which have very similar properties. The main difference is that each entity references other entities. I was going to setup the database as:

entity_a (1-1,000 Records) [Data rarely changes]
id|created|updated|entity_b_id|category_id|name|entity_b_id|entity_c_id|entity_d_id

entity_b (10,000-1,000,000 Records) [Data changes constantly]
id|created|updated|entity_b_id|category_id|name|entity_c_id|entity_e_id|entity_f_id

entity_c (10,000-10,000,000 Records) [Data changes constantly]
id|created|updated|entity_b_id|category_id|name|entity_a_id|entity_f_id

entity_d (0-1,000 Records) [Data rarely changes]
id|created|updated|entity_b_id|category_id|name

entity_e (1-100 Records) [Data rarely changes]
id|created|updated|entity_b_id|category_id|name|entity_a_id|entity_b_id

entity_f (0-50,000 Records) [Data frequently changes]
id|created|updated|entity_b_id|category_id|name|entity_c

entity_g (10-100 Records) [Data rarely changes]
id|created|updated|entity_b_id|category_id|name

entity_h (10-1,000 Records) [Data rarely changes]
id|created|updated|entity_b_id|category_id|name|entity_e_id

entity_i (1-10 Records) [Data rarely changes]
id|created|updated|entity_b_id|category_id|name

But it's been suggested that it would be easier to manage one large table as:

ent (20,000-11,000,000 Records)
id|created|updated|ent_id(b)|category_id|name|ent_id(a)|ent_id(b)|ent_id(c)|ent_id(d)|ent_id(e)|ent_id(f)

A concern with this second method is the table size as the id's will be int(11) and there will be six columns of these id's which will mainly be just set as 0.

But my main concern is speed of access as the records will be accessed very frequently by many users at once. I'm using CodeIgniter and hope to use it's caching abilities to take as much load of the database as possible but that will be limited as some of the data will change second to second.

Any help would be most appreciated.

I think it's hard to anticipate the actual performance of one vs the other since it's dependent on so many things.

Several considerations:

How important is the difference between entities? If you find yourself often selecting just one type of entity per query then the normalized solution is likely faster.

If you have queries that select on other things than the shared columns, like: entity_a with entity_c IN(something) you'll want an index on the entity_c column.

entity_c is very large. If it gets updated a lot, and queried very rarely then that's a cause for concern if you're going for the de-normalized version.

If you're doing a lot of JOINs I'm pretty sure the normalized form is faster.

My advice would be: use the normalized form. If you're seeing performance issues you could look at this solution.

You could also go for a hybrid solution. Since b and c change often, and the others don't: make two tables like that. Or give b and c it's own table but keep the others in one.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM