简体   繁体   English

使用相同的模式mongodb合并和聚合两个或多个集合的某些字段

[英]Merge and aggregate some fields of two or more collections with identical schema mongodb

I have some collections with identical schema and I want to perform a merge + aggregation on them. 我有一些具有相同架构的集合,我想对它们执行合并+聚合。 The schemas are simple and look like this: 模式很简单,如下所示:

{ 'fr': 1, 'to': 1, 'wg': 213}
{ 'fr': 1, 'to': 2, 'wg': 53}
{ 'fr': 2, 'to': 2, 'wg': 5521}

The following code works for merging two collections, but I am wondering if there is a faster solutions and/or one that could merge multiple collections in a similar way without creating nested calls: 以下代码可用于合并两个集合,但是我想知道是否有更快的解决方案和/或一个解决方案可以以类似的方式合并多个集合而无需创建嵌套调用:

var c = db.collection('first').find()

c.each(function(err, doc) {
    if (err) throw err

    if (doc == null) {
        console.log('done')
        return
    }
    db.collection('second').findOne({
        'fr': doc['fr'],
        'to': doc['to']
    }, function(err, doc2) {
        if (err) throw err

        db.collection('my_results').save({
            'fr': doc['fr'],
            'to': doc['to'],
            'wg': doc['wg'] + doc2['wg']
        }, function(err) {
            if (err) throw err
        })
    })
})

There are no absolute free operations here since you cannot do joins with MongoDB. 这里没有绝对的免费操作,因为您无法使用MongoDB进行联接。 But you can get the output you want using mapReduce and some of its features. 但是您可以使用mapReduce及其某些功能来获得所需的输出。

So first create a mapper: 因此,首先创建一个映射器:

var mapper = function () {

  emit( { fr: this.fr, to: this.to }, this.wg )

};

And then a reducer: 然后是减速器:

var reducer = function (key,values) {

  return Array.sum( values );

};

Then you run the mapReduce operation with the output set to a different collection: 然后,运行mapReduce操作,将输出设置为另一个集合:

db.first.mapReduce(mapper,reducer,{ "out": { "reduce": "third" } })

Note the "out" options there which are explained in this manual section . 注意此处的“ out”选项,本手册部分对此进行了说明 The point is, despite possibly misleading statistics output in the console, that "reduce" statement is very important. 关键是,尽管控制台中可能会误导统计信息输出,但“ reduce”语句非常重要。 This is so when we run the same code against the other collection: 这样,当我们针对另一个集合运行相同的代码时:

db.second.mapReduce(mapper,reducer,{ "out": { "reduce": "third" } })

What actually happens in the result, is the output from the first operation is also passed into the "reduce" phase of the second operation. 结果实际上发生的是,第一个操作的输出也传递到第二个操作的“减少”阶段。

The end result is that all the values from both collections with the same key values will be added together in the "third" collection: 最终结果是两个集合中所有具有相同键值的值都将被添加到“第三个”集合中:

{ "_id" : { "fr" : 1, "to" : 1 }, "value" : 426 }
{ "_id" : { "fr" : 1, "to" : 2 }, "value" : 106 }
{ "_id" : { "fr" : 2, "to" : 2 }, "value" : 11042 }

You can make that a little fancier if you wanted your fr and to to be the unique combination of two possibles in either order, or even run another mapReduce or aggregate over those results. 你可以让一个小票友如果你想你的frto是两个候选条件的任意顺序的独特组合,甚至跑过来这些结果另一个MapReduce的或聚合。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM