简体   繁体   中英

How can I store and search through large documents with MongoDB?

Well. Here's the DB schema/architecture problem.

Currently in our project we use MongoDB. We have one DB with one collection. Overall there are almost 4 billions of documents in that collection (value is constant). Each document has a unique specific ID and there is a lot of different information related to this ID (that's why MongoDB was chosen - data is totally different, so schemaless is perfect).

{
    "_id": ObjectID("5c619e81aeeb3aa0163acf02"),
    "our_id": 1552322211,
    "field_1": "Here is some information",
    "field_a": 133,
    "field_с": 561232,
    "field_b": {
            "field_0": 1,
            "field_z": [45, 11, 36]
    }
}

The purpose of that collection is to store a lot of data, that is easy to update (some data is being updated every day, some is updated once a month) and to search over different fields to retrieve the ID. Also we store the "history" of each field (and we should have ability to search over history as well). So when overtime updates were turned on we faced a problem called MongoDB 16MB maximum document size.

We've tried several workarounds (like splitting document), but all of them include either $group or $lookup stage in aggregation (grouping up by id, see example below), but both can't use indexes, which makes search over several fields EXTREMELY long.

{
    "_id": ObjectID("5c619e81aeeb3aa0163acd12"),
    "our_id": 1552322211,
    "field_1": "Here is some information",
    "field_a": 133
}


{
    "_id": ObjectID("5c619e81aeeb3aa0163acd11"),
    "our_id": 1552322211,
    "field_с": 561232,
    "field_b": {
            "field_0": 1,
            "field_z": [45, 11, 36]
    }
}

Also we can't use $match stage before those, because the search can include logical operators (like field_1 = 'a' && field_c != 320 , where field_1 is from one document and field_c is from another, so the search must be done after grouping/joining documents together) + the logical expression can be VERY complex.

So are there any tricky workarounds? If no, what other DB's can you suggest for moving to?

Kind regards.

好的,所以在花了一些时间测试不同的方法之后,我终于使用了Elasticsearch ,因为没有办法在足够的时间内通过 MongoDB 执行请求的搜索。

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM