简体   繁体   中英

Most efficient database design for a blog (posts and comments)

What would be the best way of designing a database to store blog posts and comments? I am currently thinking one table for posts, and another for comments, each with a post ID.

It seems to me, however, trawling through a large table of comments to find those for the relevant post would be expensive, and would be done every time a blog post is loaded (perhaps with some amount of caching).

Is there a better way?

It seems to me, however, trawling through a large table of comments

All the database vendors agree with you.

They offer "indexes" to limit this.

Every database system you would be using to implement your blog will use indexing . What this means is that, rather than "trawling through a large table", your database system maintains a seperate list of comments and which posts they are associated with, much like the index at the back of a book. This allows the database system to load comments associated with a post extremely quickly, and I don't see any problems with your proposed design for a blog of any size.

Indexes are routinely used to associate tables with millions of rows with other tables with millions of rows - you would have to have an exceptionally large blog to require denormalization of comments, and even still, caching would probably serve you far better than denormalizing the database.

You will need to define an index on your comments table, and associate it with whatever column holds the Post ID. How that's done is dependent on what database system you are using.

try something like this:

Blog
BlogID     int auto number PK
BlogName   string
...

BlogPost
BlogPostID   int auto number PK
BlogID       int FK to Blog.BlogID, index
BlogContent  string
....

Comment
CommentID       int auto number PK
BlogPostID      int FK to BlogPost.BlogPostID, index   
ReplyToCommentID int FK to Comment.CommentID  <<for comments on comments
...

trawling through a large table of comments to find those for the relevant post would be expensive,

An index is always there to rescue you! First index on postId and another of commentdate (desc)

Okay, let's see.

trawling through a large table of comments to find those for the relevant post would be expensive

Why do you think it'd be expensive? Because you possibly believe that a linear search will be done every time taking O(n) time. For a billion comments, a billion iterations will be done.

Now suppose a binary search tree is constructed for comment_ID. To look up any comment, you need log(n) time [base 2]. So for even 1 billion comments, only around 32 iterations will be needed.

Now consider a slightly modified BST, where each node contains k elements instead of 1 (in a list) and has k+1 children nodes. The same properties of BST are followed in this data structure as well. What we've got here is called a B-tree. More reading : GeeksForGeeks - B Tree Introduction

For a B-Tree, the lookup time is log(n) [base k]. Hence, if k=10, for 1 billion entries, only 9 iterations will be needed.

All databases save indexes for primary keys in B-Trees. Hence, the stated task would not be expensive, and you should go ahead and design the database the way it seemed obvious.

PS: You can build an index on any column of the table. By default primary key indexes are already stored. But be careful, do not make unnecessary indexes as they take up disk space.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM