简体   繁体   English

猫鼬填充与对象嵌套

[英]Mongoose populate vs object nesting

Is there any performance difference (process time of query) between using Mongoose population and direct object inclusion ?使用Mongoose 填充和直接对象包含之间是否有任何性能差异(查询的处理时间)? When should each be used ?什么时候应该使用?

Mongoose population example:猫鼬种群示例:

var personSchema = Schema({
  _id     : Number,
  name    : String,
  stories : [{ type: Schema.Types.ObjectId, ref: 'Story' }]
});

var storySchema = Schema({
  _creator : { type: Number, ref: 'Person' },
  title    : String,
});

Mongoose object nesting example: Mongoose 对象嵌套示例:

var personSchema = Schema({
  _id     : Number,
  name    : String,
  stories : [storySchema]
});

var storySchema = Schema({
  _creator : personSchema,
  title    : String,
});

The first thing to understand about mongoose population is that it is not magic, but just a convenience method that allows you to retrieve related information without doing it all yourself.首先要了解猫鼬种群,它并不神奇,而只是一种方便的方法,可以让您检索相关信息,而无需您自己动手。

The concept is essentially for use where you decide you are going to need to place data in a separate collection rather than embedding that data, and your main considerations should be typically on document size or where that related information is subject to frequent updates that would make maintaining embedded data unwieldy.该概念主要用于您决定需要将数据放置在单独的集合中而不是嵌入该数据的情况,并且您的主要考虑因素通常应该是文档大小或相关信息经常更新的情况维护嵌入式数据很笨拙。

The "not magic" part is that essentially what happens under the covers is that when you "reference" another source, the populate function makes an additional query/queries to that "related" collection in order to "merge" those results of the parent object that you have retrieved. “非魔法”部分是,本质上发生在幕后的事情是,当您“引用”另一个来源时,populate 函数会对该“相关”集合进行额外的查询/查询,以便“合并”父项的这些结果您已检索的对象。 You could do this yourself, but the method is there for convenience to simplify the task.您可以自己执行此操作,但该方法是为了方便简化任务。 The obvious "performance" consideration is that there is not a single round trip to the database (MongoDB instance) in order to retrieve all the information.显而易见的“性能”考虑是没有一次到数据库(MongoDB 实例)的往返来检索所有信息。 There is always more than one.总是不止一个。

As a sample, take two collections:作为示例,取两个集合:

{ 
    "_id": ObjectId("5392fea00ff066b7d533a765"),
    "customerName": "Bill",
    "items": [
        ObjectId("5392fee10ff066b7d533a766"),
        ObjectId("5392fefe0ff066b7d533a767")
    ]
}

And the items:和项目:

{ "_id": ObjectId("5392fee10ff066b7d533a766"), "prod": "ABC", "qty": 1 }
{ "_id": ObjectId("5392fefe0ff066b7d533a767"), "prod": "XYZ", "qty": 2 }

The "best" that can be done by a "referenced" model or the use of populate (under the hood) is this:可以通过“引用”模型或使用 populate(在幕后)完成的“最佳”是这样的:

var order = db.orders.findOne({ "_id": ObjectId("5392fea00ff066b7d533a765") });
order.items = db.items.find({ "_id": { "$in": order.items } ).toArray();

So there are clearly "at least" two queries and operations in order to "join" that data.因此,为了“加入”该数据,显然“至少”有两个查询和操作。

The embedding concept is essentially the MongoDB answer to how to deal with not supporting "joins" 1 .嵌入概念本质上是 MongoDB 对如何处理不支持“连接” 1 的回答。 So that rather that split data into normalized collections you try to embed the "related" data directly within the document that uses it.因此,与其将数据拆分为规范化的集合,不如尝试将“相关”数据直接嵌入到使用它的文档中。 The advantages here are that there is a single "read" operation for retrieving the "related" information, and also a single point of "write" operations to both update "parent" and "child" entries, though often not possible to write to "many" children at once without processing "lists" on the client or otherwise accepting "multiple" write operations, and preferably in "batch" processing.这里的优点是有一个单一的“读取”操作来检索“相关”信息,还有一个单一的“写入”操作点来更新“父”和“子”条目,尽管通常不可能写入一次“许多”子进程而不在客户端上处理“列表”或以其他方式接受“多个”写入操作,并且最好在“批处理”处理中。

Data then rather looks like this ( compared to the example above ):数据则看起来像这样(与上面的示例相比):

{ 
    "_id": ObjectId("5392fea00ff066b7d533a765"),
    "customerName": "Bill",
    "items": [
        { "_id": ObjectId("5392fee10ff066b7d533a766"), "prod": "ABC", "qty": 1 },
        { "_id": ObjectId("5392fefe0ff066b7d533a767"), "prod": "XYZ", "qty": 2 }
    ]
}

Therefore actually fetching the data is just a matter of:因此,实际获取数据只是一个问题:

db.orders.findOne({ "_id": ObjectId("5392fea00ff066b7d533a765") });

The pros and cons of either will always largely depend on the usage pattern of your application.两者的优缺点在很大程度上取决于您的应用程序的使用模式。 But at a glance:但一目了然:

Embedding嵌入

  • Total document size with embedded data will typically not exceed 16MB of storage (the BSON limit) or otherwise ( as a guideline ) have arrays that contain 500 or more entries.嵌入数据的总文档大小通常不会超过 16MB 的存储空间(BSON 限制),否则(作为准则)具有包含 500 个或更多条目的数组。

  • Data that is embedded does generally not require frequent changes.嵌入的数据通常不需要频繁更改。 So you could live with "duplication" that comes from the de-normalization not resulting in the need to update those "duplicates" with the same information across many parent documents just to invoke a change.因此,您可以忍受来自非规范化的“重复”,而不会导致需要在许多父文档中使用相同信息更新这些“重复”,只是为了调用更改。

  • Related data is frequently used in association with the parent.相关数据经常与父项一起使用。 Which means that if your "read/write" cases are pretty much always needing to "read/write" to both parent and child then it makes sense to embed the data for atomic operations.这意味着,如果您的“读/写”案例几乎总是需要“读/写”给父和子,那么嵌入原子操作的数据是有意义的。

Referencing参考

  • The related data is always going to exceed the 16MB BSON limit.相关数据总是会超过 16MB BSON 限制。 You can always consider a hybrid approach of "bucketing", but the general hard limit of the main document cannot be breached.您总是可以考虑“bucketing”的混合方法,但不能违反主文档的一般硬限制。 Common cases are "post" and "comments" where "comment" activity is expected to be very large.常见的情况是“发布”和“评论”,其中“评论”活动预计非常大。

  • Related data needs regular updating.相关数据需要定期更新。 Or essentially the case where you "normalize" because that data is "shared" among many parents and the "related" data is changed frequently enough that it would be impractical to update embedded items in every "parent" where that "child" item occurs.或者基本上是您“规范化”的情况,因为该数据在许多父项之间“共享”,并且“相关”数据更改的频率足够高,以至于在出现“子”项的每个“父项”中更新嵌入项是不切实际的. The easier case is to just reference the "child" and make the change once.更简单的情况是只引用“孩子”并进行一次更改。

  • There is a clear separation of reads and writes.读和写有明确的分离。 In the case where maybe you are not going to always require that "related" information when reading the "parent" or otherwise to not need to always alter the "parent" when writing to the child, there could be good reason to separate the model as referenced.如果您在阅读“父母”时并不总是需要“相关”信息,或者在写信给孩子时不需要总是改变“父母”,那么可能有充分的理由将模型分开如参考。 Additionally if there is a general desire to update many "sub-documents" at once in which where those "sub-documents" are actually references to another collection, then quite often the implementation is more efficient to do when the data is in a separate collection.此外,如果普遍希望一次更新许多“子文档”,其中这些“子文档”实际上是对另一个集合的引用,那么当数据位于单独的集合中时,实现通常会更有效收藏。

So there actually is a much wider discussion of the "pros/cons" for either position on the MongoDB documentation on Data Modelling , which covers various use cases and ways to approach either using embedding or referenced model as is supported by the populate method.因此,对于 MongoDB 文档中关于Data Modeling 的任一位置的“优点/缺点”实际上有更广泛的讨论,其中涵盖了各种用例以及使用 populate 方法支持的嵌入或引用模型的方法。

Hopefully the "dot points" are of use, but the generally recommendation is to consider the data usage patterns of your application and choose what is best.希望“点点”有用,但一般建议是考虑应用程序的数据使用模式并选择最好的。 Having the "option" to embed "should" be the reason you have chosen MongoDB, but it will actually be how your application "uses the data" that makes the decision to which method suits which part of your data modelling (as it is not "all or nothing") the best.拥有嵌入“应该”的“选项”是您选择 MongoDB 的原因,但实际上您的应用程序“使用数据”的方式将决定哪种方法适合您的数据建模的哪一部分(因为它不是“全有或全无”)最好的。

  1. Note that since this was originally written MongoDB introduced the $lookup operator which does indeed perform "joins" between collections on the server.请注意,由于这是最初编写的,MongoDB 引入了$lookup运算符,它确实在服务器上的集合之间执行“连接”。 For the purposes of the general discussion here, whist "better" in most circumstances that the "multiple query" overhead incurred by populate() and "multiple queries" in general, there still is a "significant overhead" incurred with any $lookup operation.出于此处的一般性讨论的目的,在大多数情况下, populate()和“多个查询”产生的“多个查询”开销在大多数情况下“更好”,但任何$lookup操作仍然会产生“显着的开销” .

The core design principle is "embedded" means "already there" as opposed to "fetching from somewhere else".核心设计原则是“嵌入”意味着“已经在那里”而不是“从其他地方获取”。 Essentially the difference between "in your pocket" and "on the shelf", and in I/O terms usually more like "on the shelf in the library downtown" , and notably further away for network based requests.本质上是“在你的口袋里”和“在架子上”之间的区别,在 I/O 术语中通常更像是“在市中心图书馆的架子上” ,尤其是基于网络的请求更远。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM