简体   繁体   English

如何通过将数组元素与 MongoDB 中的 MapReduce 匹配来对文档进行分组?

[英]How to group documents by matching array elements with MapReduce in MongoDB?

I have a database with a column containing an array of strings.我有一个包含字符串数组的列的数据库。 Example table:示例表:

name | words                          | ...
Ash  | ["Apple", "Pear", "Plum"]      | ...
Joe  | ["Walnut", "Peanut"]           | ...
Max  | ["Pineapple", "Apple", "Plum"] | ...

Now I would like to match this table against a given array of words and group the documents by their matching rate.现在我想将此表与给定的单词数组进行匹配,并按匹配率对文档进行分组。

Example input with expected result:具有预期结果的示例输入:

// matched for input = ["Walnut", "Peanut", "Apple"]
{
  "1.00": [{name:"Joe", match:"1.00"}],
  "0.33": [{name:"Ash", match:"0.33"}, {name:"Max", match:"0.33"}]
}

I am using the following map function emitting the document with the matching rate as the key:我正在使用以下map函数以匹配率作为关键来发出文档:

function map() {
    var matches = 0.0;
    for(var i in input) 
      if(this.words.indexOf(input[i]) !== -1) matches+=1;
    matches /= input.length;
    var key = ""+matches.toFixed(2);
    emit(key, {name: this.name, match: key});
}

Now missing is a matching reduce function to combine the emitted KV pairs into the result object.现在缺少的是匹配的reduce函数,用于将发出的 KV 对组合到结果对象中。

I have tried it like this:我试过这样:

function reduce(key, value) {
    var res = {};
    res[key] = values;
    return res;
}

However I have trouble with the specification that但是我对规范有问题

MongoDB can invoke the reduce function more than once for the same key. MongoDB 可以为同一个键多次调用 reduce 函数。 In this case, the previous output from the reduce function for that key will become one of the input values to the next reduce function invocation for that key.在这种情况下,该键的reduce 函数的先前输出将成为该键的下一个reduce 函数调用的输入值之一。

...resulting in nested result objects. ...导致嵌套的结果对象。 What is the correct way to group documents by their match?按匹配对文档进行分组的正确方法是什么?

invoke the reduce function more than once for the same key.为同一个键多次调用 reduce 函数。

That's idempotence , and the reduce function must respect that.这就是幂等性,reduce 函数必须尊重这一点。

But, to make this simple, you just have to make sure the map output is in the same format as the reduce one.但是,为了简单起见,您只需要确保地图输出的格式与reduce 的格式相同。

For your case, something like this will work:对于你的情况,这样的事情会起作用:

db.col.insert({"name": "Ash", "words": ["Apple", "Pear", "Plum"]})
db.col.insert({"name": "Joe", "words": ["Walnut", "Peanut"]})
db.col.insert({"name": "Max", "words": ["Pineapple", "Apple", "Plum"]})

function map() {

    input = ["Walnut", "Peanut", "Apple"]

    var matches = 0.0;
    for(var i in input) 
      if(this.words.indexOf(input[i]) !== -1) matches+=1;
    matches /= input.length;
    var key = ""+matches.toFixed(2);

    emit(key, {users: [{name: this.name, match: key}]});
}

function reduce(key, value) {

    ret = value[0]

    for(var i=1; i<value.length; i++){
        ret.users = ret.users.concat(value[i].users)
    }

    return ret

}

db.col.mapReduce(map, reduce, {"out": {inline:1}})

Output:输出:

{
    "results" : [
        {
            "_id" : "0.33",
            "value" : {
                "users" : [
                    {
                        "name" : "Ash",
                        "match" : "0.33"
                    },
                    {
                        "name" : "Max",
                        "match" : "0.33"
                    }
                ]
            }
        },
        {
            "_id" : "0.67",
            "value" : {
                "users" : [
                    {
                        "name" : "Joe",
                        "match" : "0.67"
                    }
                ]
            }
        }
    ],
    "timeMillis" : 22,
    "counts" : {
        "input" : 3,
        "emit" : 3,
        "reduce" : 1,
        "output" : 2
    },
    "ok" : 1
}

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM