简体   繁体   中英

MongoDB architecture: how to store a large amount of arrays or sub documents in a scalable way

I am currently working on a blogging app, in which users can create their own blogs and each blog has blogposts within that. I'm ideating about architecting a database that is scalable when each blog has a lot of blogposts.

So is it better to structure my database as this:

blog1 : {
 blogname : 'blog1',
 blogposts: [array of blogposts] 
},

blog2 : {
 blogname : 'blog2',
 blogposts: [array of blogposts] 
}

Or should I create a separate collection with all the blogposts, something like this:

blogpost1: {
 id: 'blogpost1',
 content: {blogpost content in json format}
},
blogpost2: {
 id: 'blogpost2',
 content: {blogpost content in json format}
}

and reference them in the blog collection.

I want to know which choice would be superior when there are a lot of blogposts. Because I remember reading somewhere in MongoDB docs that it's not recommended to have arrays within document that can grow beyond bounds, so approach #1 is not ideal, right?

When creating databases, I find it useful to think about the requests I would be making.

A blogging app user would want to search all blogs or find a blogger by some criteria.

In this case separate collections for bloggers and blogs would work best. Then structure your documents so that the bloggers link to their blogs and vice versa.

This can be done with Mongoose Schemas ( https://mongoosejs.com/docs/index.html ).

// models/blogger.js
const mongoose = require('mongoose')

const bloggerSchema = mongoose.Schema({
  blogs: [
    {
      type: mongoose.Schema.Types.ObjectId,
      ref: 'Blog'
    }
  ],
  name: String
})

bloggerSchema.set('toJSON', {
  transform: (document, returnedObject) => {
    const blogger = returnedObject

    blogger.id = blogger._id.toString()
    delete blogger._id
    delete blogger.__v
  }
})

module.exports = mongoose.model('Blogger', bloggerSchema)

Then use populate with your request:

// controllers/bloggers.js
const bloggersRouter = require('express').Router()
const Blogger = require('../models/blogger')

bloggersRouter.get('/', async (request, response) => {
  const bloggers = await Blogger.find({}).populate(
    'blogs', {
      title: 1
    }
  )
  response.json(bloggers.map(blogger => blogger.toJSON()))
})

module.exports = bloggersRouter

This way you don't have to add the blogs in their entirety to the blogger document, you can just include the title or anything else that you need on the bloggers initial view.

You could also think about limiting the length of a blog, so you can have more control over the data and then think about the options Joe suggested.

Why does it have to be one or the other?

Storing the blog posts in the same document as the blog is great as long as the individual posts are not very large, and there aren't very many of them.

Storing the posts in a separate collection is good for bigger posts and busy blogs but adds an additional query or lookup to retrieve.

I would think it is expected that your users' output will run the gamut from sparse to prolific, and individual posts will range from a few dozen bytes to many megabytes.

For small posts on not very active blogs, store the posts in the blog document for efficient retrieval.

For busy blogs, store them in an archive collection. Perhaps store the most recent couple of posts, or the most popular posts, in the blog document so you don't have to refer to the other collection every time.

You will also need to figure out how to split a post between documents. MongoDB has a 16MB limit on a single document, so if any of your users make huge posts, you'll need to be able to store them somewhere.

Your question as written seems to be asking whether it is better to follow a relation model or a strict document model. I think in reality neither is a perfect fit for this and a hybridized and flexible approach would work out better.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM