简体   繁体   English

MongoDB:合并数组的功能

[英]MongoDB: Function to Consolidate Arrays

I have a large dataset with documents that sometimes cross-reference each other, sometimes do not. 我有一个很大的数据集,其中包含文档,这些文档有时相互交叉引用,有时却没有。 Before I can mapreduce based on those cross references, I have to set the array of cross-references to be that same for every value in the cross reference. 在我可以基于这些交叉引用进行映射缩减之前,必须将交叉引用的数组设置为对交叉引用中的每个值都相同。

I use this in the shell function to consolidate those arrays: 我在shell函数中使用它来合并这些数组:

function fixArray2() {
var counter = 0;
// I only want the xref for each field, I don't even want the id
var cursor = db.catalog.find({}, {xref: true, _id: false});

// I don't want to init this inside the loop, worried about memory leaks
var consolidatedArray = [];
while (cursor.hasNext()) {
    var xref1 = cursor.next().xref;
    // first pass: create a consolidated array when the cross references match
    var limitedCursor1 = db.catalog.find({"name":{$in:xref1}});
    while (limitedCursor1.hasNext()) {
        var doc1 = limitedCursor1.next();
        consolidatedArray = consolidatedArray.concat(doc1.xref);
    }
    consolidatedArray = consolidatedArray.unique();
    // now that we have the consolidated array, reset the xref field of the object to it
    for (var i=0; i<consolidatedArray.length; i++) {
        db.catalog.update({name:consolidatedArray[i]},{$set:{xref: consolidatedArray}},false, true);
    }

    consolidatedArray.length = 0;

    counter++;
    if (counter % 1000 == 0) {
        print("Processed " + counter + " documents.");
    }
}

} }

It works, but I have to run it fairly often. 它可以工作,但是我必须经常运行它。 Can anyone suggest improvements? 谁能提出改进建议?

If you do the work up front when writing the documents to the collection you may be able to avoid doing this map-reduce where you do the work at a later time. 如果在将文档写入集合时进行了前期工作,则可以避免在以后进行工作时进行这种map-reduce的操作。

Therefore, get the list of documents that should be cross referenced and write them with the document upon insertion. 因此,获取应交叉引用的文档列表,并在插入时将其与文档一起写入。 Update as needed, when a document is removed or no longer references the other for example. 例如,当文档被删除或不再引用另一个文档时,可根据需要进行更新。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM