简体   繁体   English

MongoDB 嵌入式文档:大小限制和聚合性能问题

[英]MongoDB embedded documents: size limit and aggregation performance concerns

In MongoDB's documentation it is suggested to put as much data as possible in a single document.在 MongoDB 的文档中,建议将尽可能多的数据放在一个文档中。 It is also suggested NOT to use ObjectId ref based sub-documents unless the data of those sub-documents must be referenced from more than one document.还建议不要使用基于ObjectId ref的子文档,除非这些子文档的数据必须从多个文档中引用。

In my case I have a one-to-many relationship like this:就我而言,我有这样的一对多关系:

Log schema:日志架构:

const model = (mongoose) => {
    const LogSchema = new mongoose.Schema({
        result: { type: String, required: true },
        operation: { type: Date, required: true },
        x: { type: Number, required: true },
        y: { type: Number, required: true },
        z: { type: Number, required: true }
    });
    const model = mongoose.model("Log", LogSchema);
    return model;
};

Machine schema:机器架构:

const model = (mongoose) => {
    const MachineSchema = new mongoose.Schema({
        model: { type: String, required: true },
        description: { type: String, required: true },
        logs: [ mongoose.model("Log").schema ]
    });
    const model = mongoose.model("Machine", MachineSchema);
    return model;
};
module.exports = model;

Each Machine will have many Production_Log documents (more than one million).每个Machine都会有很多Production_Log文件(超过一百万)。 Using embedded documents I hitted the 16mb per document limit very quickly during my tests and I couldn't add any more Production_Log documents to the Machine documents.使用嵌入式文档,我在测试期间很快就达到了每个文档 16mb 的限制,并且我无法将更多的Production_Log文档添加到机器文档中。

My doubts我的疑惑

  1. Is this a case where one should be using sub-documents as ObjectId references instead of embedded documents?在这种情况下,应该使用子文档作为ObjectId引用而不是嵌入文档吗?

  2. Is there any other solution I could evaluate?还有其他我可以评估的解决方案吗?

  3. I will be accessing Production_Log documents to generate stats for each Machine using the aggregation framework.我将访问Production_Log文档以使用聚合框架为每台机器生成统计信息。 Should I have any extra consideration on the schema design?我应该对架构设计有任何额外的考虑吗?

Thank you very much in advance for your advice!非常感谢您的建议!

Database normalization is not applicable to MongoDB数据库规范化不适用于 MongoDB

MongoDB scales better if you store full information in the single document (Data redundancy).如果将完整信息存储在单个文档中(数据冗余),MongoDB 的扩展性会更好。 Database normalization obligate split data in different collections, but once growth your data, it will cause bottlenecks issues.数据库规范化要求将数据拆分到不同的集合中,但是一旦数据增长,就会导致瓶颈问题。

Use only LOG Schema:仅使用LOG模式:

const model = (mongoose) => {
    const LogSchema = new mongoose.Schema({
        model: { type: String, required: true },
        description: { type: String, required: true },
        result: { type: String, required: true },
        operation: { type: Date, required: true },
        x: { type: Number, required: true },
        y: { type: Number, required: true },
        z: { type: Number, required: true }
    });
    const model = mongoose.model("Log", LogSchema);
    return model;
};

Read / Write operation scales fine in this way.读/写操作以这种方式很好地扩展。

With Aggregation you can process data and compute desired result.使用聚合,您可以处理数据并计算所需的结果。

Please see if this approach suits your need请查看此方法是否适合您的需要

The Log collection would be having more data generated whereas the Machine collection never exceed 16MB. Log集合将生成更多数据,而Machine集合永远不会超过 16MB。 Instead of embedding Log collection into Machine collection try the vice versa.不要将Log集合嵌入到Machine集合中,反之亦然。

Your modified schema would be like this您修改后的架构将是这样的

Machine schema:机器架构:

const model = (mongoose) => {
    const MachineSchema = new mongoose.Schema({
        model: { type: String, required: true },
        description: { type: String, required: true }        
    });
    const model = mongoose.model("Machine", MachineSchema);
    return model;
};
module.exports = model;

Log schema:日志架构:

const model = (mongoose) => {
    const LogSchema = new mongoose.Schema({
        result: { type: String, required: true },
        operation: { type: Date, required: true },
        x: { type: Number, required: true },
        y: { type: Number, required: true },
        z: { type: Number, required: true },
        machine: [ mongoose.model("Machine").schema ]
    });
    const model = mongoose.model("Log", LogSchema);
    return model;
};

If still we are overshooting the size of Document(16MB) then in the Log Schema we can create a new document for every Day/Hour/Week depending on the logs we are generating.如果我们仍然超过文档 (16MB) 的大小,那么在日志模式中,我们可以根据我们生成的日志为每一天/每小时/每周创建一个新文档。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM