Two mongodb collections in one query

Question

I have a big collection of clients and a huge collection of the clients data, the collections are separated and I don't want to combine them to a single collection (because of the other already working servlets) but now I need to "Join" data from both collection in a single result.

Since The query should return a big number of results I don't want to query the server once and then use the result to query again. I'm also concerned about the traffic between the server and the DB and the memory that the result set will occupy in the server RAM.

The way it's working now is that I get the relevant client list from the 'clients' collection and send this list to the query of the 'client data' collection and only then I get the aggregated results.

I want to cut off the getting and sending the client list from and right back to the server, get the server to ask himself, let the query of client data collection to ask clients collection for the relevant client list.

How can I use a stored procedure(javascript functions) to do the query in the DB and return only the relevant clients out of the collection. Alternatively, Is there a way to write a query that joins result from another collection ?

Answer 1

"Good news everyone", this aggregation query work just fine in the mongo shell as a join query

db.clientData.aggregate([{
    $match: {
        id: {
            $in: db.clients.distinct("_id",
            {
                "tag": "qa"
            })
        }
    }
},
    $group: {
        _id: "$computerId",
        total_usage: {
            $sum: "$workingTime"
        }
    }
}]);

Answer 2

The key idea with MongoDB data modelling is to be write-heavy , not read-heavy: store the data in the format that you need for reading, not in some format that minimizes/avoids redundancy (ie use a de-normalized data model).

I don't want to combine them to a single collection

That's not a good argument

I'm also concerned about the traffic between the server and the DB [...]

If you need the data, you need the data. How does the way it is queried make a difference here?

[...] and the memory that the result set will occupy in the server RAM.

Is the amount of data so large that you want to stream it from the server to the client, such that is transferred in chunks? How much data are we talking, and why does the client read it all?

How can I use a stored procedure to do the query in the DB and return only the relevant clients out of the collection

There are no stored procedures in MongoDB, but you can use server-side map/reduce to 'join' collections . Generally, code that is stored in and run by the database is a violation of the layer architecture separation of concerns. I consider it one of the most ugly hacks of all time - but that's debatable.

Also, less debatable, keep in mind that M/R has huge overhead in MongoDB and is not geared towards real-time queries made eg in a web server call. These calls will take hundreds of milliseconds.

Is there a way to write a query that joins result from another collection ?

No, operations are constrained to a single collection. You can perform a second query and use the $in operator there, however, which is similar to a subselect and reasonably fast, but of course requires two round-trips.

Answer 3

How can I use a stored procedure to do the query in the DB and return only the relevant clients out of the collection. Alternatively

There are no procedure in Mongodb

Alternatively, Is there a way to write a query that joins result from another collection ?

You normally don't need to do any Joins in MongoDB and there is no such thing. The flexibility of the document handled already typical need of joins. You should the think about your document model and asking how to design joins out of your schema should always be your first port of call. As alternative you may need to use aggregation or Map-Reduce in server side to handle this.

Answer 4

First of all, mnemosyn and Michael9 are right. But if I were in your shoes, also assuming that the client data collection is one document per client, I would store the document ID of the client data document in the client document to make the "join" (still no joins in Mongo) easier.

If you have more client data documents per client then an array of document IDs.

But all this does not save you from that you have to implement the "join" in your application code, if it's a Rails app then in your controller probably.

Two mongodb collections in one query

Question

4 answers

solution1
3 ACCPTED 2015-02-23 09:53:34

solution2
1 2015-02-15 15:01:41

solution3
0 2015-02-15 15:01:26

solution4
0 2015-02-15 17:53:47

Two mongodb collections in one query

Question

4 answers

solution1 3 ACCPTED 2015-02-23 09:53:34

solution2 1 2015-02-15 15:01:41

solution3 0 2015-02-15 15:01:26

solution4 0 2015-02-15 17:53:47

solution1
3 ACCPTED 2015-02-23 09:53:34

solution2
1 2015-02-15 15:01:41

solution3
0 2015-02-15 15:01:26

solution4
0 2015-02-15 17:53:47