简体   繁体   中英

Search on multiple collections in MongoDB

I know the theory of MongoDB and the fact that is doesn't support joins, and that I should use embeded documents or denormalize as much as possible, but here goes:

I have multiple documents, such as:

  • Users, which embed Suburbs, but also has: first name, last name
  • Suburbs, which embed States
  • Child, which embeds School, belongs to a User, but also has: first name, last name

Example:

Users:
{ _id: 1, first_name: 'Bill', last_name: 'Gates', suburb: 1 }
{ _id: 2, first_name: 'Steve', last_name: 'Jobs', suburb: 3 }

Suburb:
{ _id: 1, name: 'Suburb A', state: 1 }
{ _id: 2, name: 'Suburb B', state: 1 }
{ _id: 3, name: 'Suburb C', state: 3 }

State:
{ _id: 1, name: 'LA' }
{ _id: 3, name: 'NY' }

Child:
{ _id: 1, _user_id: 1, first_name: 'Little Billy', last_name: 'Gates' }
{ _id: 2, _user_id: 2, first_name: 'Little Stevie', last_name: 'Jobs' }

The search I need to implement is on:

  • first name, last name of Users and Child
  • State from Users

I know that I have to do multiple queries to get it done, but how can that be achieved? With mapReduce or aggregate?

Can you point out a solution please?

I've tried to use mapReduce but that didn't get me to have documents from Users which contained a state_id, so that's why I brought it up here.

This answer is outdated. Since version 3.2, MongoDB has limited support for left outer joins with the $lookup aggregation operator

MongoDB does not do queries which span multiple collections - period. When you need to join data from multiple collections, you have to do it on the application level by doing multiple queries.

  1. Query collection A
  2. Get the secondary keys from the result and put them into an array
  3. Query collection B passing that array as the value of the $in-operator
  4. Join the results of both queries programmatically on the application layer

Having to do this should be rather the exception than the norm. When you frequently need to emulate JOINs like that, it either means that you are still thinking too relational when you design your database schema or that your data is simply not suited for the document-based storage concept of MongoDB.

You'll find MongoDB easier to understand if you take a denormalized approach to schema design. That is, you want to structure your documents the way the requesting client application understands them. Essentially, you are modeling your documents as domain objects with which the applicaiton deals. Joins become less important when you model your data this way. Consider how I've denormalized your data into a single collection:

{  
    _id: 1, 
    first_name: 'Bill', 
    last_name: 'Gates', 
    suburb: 'Suburb A',
    state: 'LA',
    child : [ 3 ]
}

{ 
    _id: 2, 
    first_name: 'Steve', 
    last_name: 'Jobs', 
    suburb: 'Suburb C',
    state 'NY',
    child: [ 4 ] 
}
{ 
    _id: 3, 
    first_name: 'Little Billy', 
    last_name: 'Gates',
    suburb: 'Suburb A',
    state: 'LA',
    parent : [ 1 ]
}

{
    _id: 4, 
    first_name: 'Little Stevie', 
    last_name: 'Jobs'
    suburb: 'Suburb C',
    state 'NY',
    parent: [ 2 ]
}

The first advantage is that this schema is far easier to query. Plus, updates to address fields are now consistent with the individual Person entity since the fields are embedded in a single document. Notice also the bidirectional relationship between parent and children? This makes this collection more than just a collection of individual people. The parent-child relationships mean this collection is also a social graph . Here are some resoures which may be helpful to you when thinking about schema design in MongoDB .

So now join is possible in mongodb and you can achieve this using $lookup and $facet aggregation here and which is probably the best way to find in multiple collections

db.collection.aggregate([
  { "$limit": 1 },
  { "$facet": {
    "c1": [
      { "$lookup": {
        "from": Users.collection.name,
        "pipeline": [
          { "$match": { "first_name": "your_search_data" } }
        ],
        "as": "collection1"
      }}
    ],
    "c2": [
      { "$lookup": {
        "from": State.collection.name,
        "pipeline": [
          { "$match": { "name": "your_search_data" } }
        ],
        "as": "collection2"
      }}
    ],
    "c3": [
      { "$lookup": {
        "from": State.collection.name,
        "pipeline": [
          { "$match": { "name": "your_search_data" } }
        ],
        "as": "collection3"
      }}
    ]
  }},
  { "$project": {
    "data": {
      "$concatArrays": [ "$c1", "$c2", "$c3" ]
    }
  }},
  { "$unwind": "$data" },
  { "$replaceRoot": { "newRoot": "$data" } }
])

Here's a JavaScript function that will return an array of all records matching specified criteria, searching across all collections in the current database:

function searchAll(query,fields,sort) {
    var all = db.getCollectionNames();
    var results = [];
    for (var i in all) {
        var coll = all[i];
        if (coll == "system.indexes") continue;
        db[coll].find(query,fields).sort(sort).forEach(
            function (rec) {results.push(rec);} );
    }
    return results;
}

From the Mongo shell, you can copy/paste the function in, then call it like so:

> var recs = searchAll( {filename: {$regex:'.pdf$'} }, {moddate:1,filename:1,_id:0}, {filename:1} ) > recs

Based on @brian-moquin and others, I made a set of functions to search entire collections with entire keys(fields) by simple keyword.

It's in my gist; https://gist.github.com/fkiller/005dc8a07eaa3321110b3e5753dda71b

For more detail, I first made a function to gather all keys.

function keys(collectionName) {
    mr = db.runCommand({
        'mapreduce': collectionName,
        'map': function () {
            for (var key in this) { emit(key, null); }
        },
        'reduce': function (key, stuff) { return null; },
        'out': 'my_collection' + '_keys'
    });
    return db[mr.result].distinct('_id');
}

Then one more to generate $or query from keys array.

function createOR(fieldNames, keyword) {
    var query = [];
    fieldNames.forEach(function (item) {
        var temp = {};
        temp[item] = { $regex: '.*' + keyword + '.*' };
        query.push(temp);
    });
    if (query.length == 0) return false;
    return { $or: query };
}

Below is a function to search a single collection.

function findany(collection, keyword) {
    var query = createOR(keys(collection.getName()));
    if (query) {
        return collection.findOne(query, keyword);
    } else {
        return false;
    }
}

And, finally a search function for every collections.

function searchAll(keyword) {
    var all = db.getCollectionNames();
    var results = [];
    all.forEach(function (collectionName) {
        print(collectionName);
        if (db[collectionName]) results.push(findany(db[collectionName], keyword));
    });
    return results;
}

You can simply load all functions in Mongo console, and execute searchAll('any keyword')

You can achieve this using $mergeObjects by MongoDB Driver Example Create a collection orders with the following documents:

db.orders.insert([
  { "_id" : 1, "item" : "abc", "price" : 12, "ordered" : 2 },
  { "_id" : 2, "item" : "jkl", "price" : 20, "ordered" : 1 }
])

Create another collection items with the following documents:

db.items.insert([
  { "_id" : 1, "item" : "abc", description: "product 1", "instock" : 120 },
  { "_id" : 2, "item" : "def", description: "product 2", "instock" : 80 },
  { "_id" : 3, "item" : "jkl", description: "product 3", "instock" : 60 }
])

The following operation first uses the $lookup stage to join the two collections by the item fields and then uses $mergeObjects in the $replaceRoot to merge the joined documents from items and orders:

db.orders.aggregate([
   {
      $lookup: {
         from: "items",
         localField: "item",    // field in the orders collection
         foreignField: "item",  // field in the items collection
         as: "fromItems"
      }
   },
   {
      $replaceRoot: { newRoot: { $mergeObjects: [ { $arrayElemAt: [ "$fromItems", 0 ] }, "$$ROOT" ] } }
   },
   { $project: { fromItems: 0 } }
])

The operation returns the following documents:

{ "_id" : 1, "item" : "abc", "description" : "product 1", "instock" : 120, "price" : 12, "ordered" : 2 }
{ "_id" : 2, "item" : "jkl", "description" : "product 3", "instock" : 60, "price" : 20, "ordered" : 1 }

This Technique merge Object and return the result

Minime solution worked except that it required a fix: var query = createOR(keys(collection.getName())); need to add keyword as 2nd parameter to createOR call here.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM