简体   繁体   English

MongoDB 架构:如何以可扩展的方式存储大量数组或子文档

[英]MongoDB architecture: how to store a large amount of arrays or sub documents in a scalable way

I am currently working on a blogging app, in which users can create their own blogs and each blog has blogposts within that.我目前正在开发一个博客应用程序,用户可以在其中创建自己的博客,并且每个博客中都有博客文章。 I'm ideating about architecting a database that is scalable when each blog has a lot of blogposts.当每个博客都有很多博客文章时,我正在考虑构建一个可扩展的数据库。

So is it better to structure my database as this:那么将我的数据库结构如下是否更好:

blog1 : {
 blogname : 'blog1',
 blogposts: [array of blogposts] 
},

blog2 : {
 blogname : 'blog2',
 blogposts: [array of blogposts] 
}

Or should I create a separate collection with all the blogposts, something like this:或者我应该为所有博客文章创建一个单独的集合,如下所示:

blogpost1: {
 id: 'blogpost1',
 content: {blogpost content in json format}
},
blogpost2: {
 id: 'blogpost2',
 content: {blogpost content in json format}
}

and reference them in the blog collection.并在博客收藏中引用它们。

I want to know which choice would be superior when there are a lot of blogposts.我想知道当有很多博客文章时,哪个选择会更好。 Because I remember reading somewhere in MongoDB docs that it's not recommended to have arrays within document that can grow beyond bounds, so approach #1 is not ideal, right?因为我记得在 MongoDB 文档中的某处读过,不建议在文档中包含可以超出界限的数组,所以方法 #1 并不理想,对吧?

When creating databases, I find it useful to think about the requests I would be making.在创建数据库时,我发现考虑我将提出的请求很有用。

A blogging app user would want to search all blogs or find a blogger by some criteria.博客应用程序用户想要搜索所有博客或根据某些条件查找博主。

In this case separate collections for bloggers and blogs would work best.在这种情况下,博主和博客的单独集合将最有效。 Then structure your documents so that the bloggers link to their blogs and vice versa.然后构建您的文档,以便博主链接到他们的博客,反之亦然。

This can be done with Mongoose Schemas ( https://mongoosejs.com/docs/index.html ).这可以通过 Mongoose Schemas ( https://mongoosejs.com/docs/index.html ) 来完成。

// models/blogger.js
const mongoose = require('mongoose')

const bloggerSchema = mongoose.Schema({
  blogs: [
    {
      type: mongoose.Schema.Types.ObjectId,
      ref: 'Blog'
    }
  ],
  name: String
})

bloggerSchema.set('toJSON', {
  transform: (document, returnedObject) => {
    const blogger = returnedObject

    blogger.id = blogger._id.toString()
    delete blogger._id
    delete blogger.__v
  }
})

module.exports = mongoose.model('Blogger', bloggerSchema)

Then use populate with your request:然后使用 populate 与您的请求:

// controllers/bloggers.js
const bloggersRouter = require('express').Router()
const Blogger = require('../models/blogger')

bloggersRouter.get('/', async (request, response) => {
  const bloggers = await Blogger.find({}).populate(
    'blogs', {
      title: 1
    }
  )
  response.json(bloggers.map(blogger => blogger.toJSON()))
})

module.exports = bloggersRouter

This way you don't have to add the blogs in their entirety to the blogger document, you can just include the title or anything else that you need on the bloggers initial view.这样您就不必将博客完整地添加到博客文档中,您只需在博客的初始视图中包含标题或您需要的任何其他内容。

You could also think about limiting the length of a blog, so you can have more control over the data and then think about the options Joe suggested.您还可以考虑限制博客的长度,以便您可以更好地控制数据,然后考虑 Joe 建议的选项。

Why does it have to be one or the other?为什么必须是其中之一?

Storing the blog posts in the same document as the blog is great as long as the individual posts are not very large, and there aren't very many of them.将博客文章存储在与博客相同的文档中是很好的,只要单个文章不是很大,而且不是很多。

Storing the posts in a separate collection is good for bigger posts and busy blogs but adds an additional query or lookup to retrieve.将帖子存储在单独的集合中适用于较大的帖子和繁忙的博客,但会增加额外的查询或查找以进行检索。

I would think it is expected that your users' output will run the gamut from sparse to prolific, and individual posts will range from a few dozen bytes to many megabytes.我认为您的用户输出的范围应该是从稀疏到多产,个别帖子的大小从几十字节到几兆字节不等。

For small posts on not very active blogs, store the posts in the blog document for efficient retrieval.对于不太活跃的博客上的小帖子,将帖子存储在博客文档中以便高效检索。

For busy blogs, store them in an archive collection.对于繁忙的博客,请将它们存储在存档集合中。 Perhaps store the most recent couple of posts, or the most popular posts, in the blog document so you don't have to refer to the other collection every time.也许在博客文档中存储最近的几篇文章或最受欢迎的文章,这样您就不必每次都参考其他集合。

You will also need to figure out how to split a post between documents.您还需要弄清楚如何在文档之间拆分帖子。 MongoDB has a 16MB limit on a single document, so if any of your users make huge posts, you'll need to be able to store them somewhere. MongoDB 对单个文档有 16MB 的限制,因此如果您的任何用户发布了大量帖子,您将需要能够将它们存储在某个地方。

Your question as written seems to be asking whether it is better to follow a relation model or a strict document model.您所写的问题似乎是在询问遵循关系模型还是严格的文档模型更好。 I think in reality neither is a perfect fit for this and a hybridized and flexible approach would work out better.我认为实际上这两种方法都不适合这种情况,混合和灵活的方法会更好。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM