I have a MongoDB db with a single rather large collection of documents (13GB for about 2M documents) sitting on a single server with 8GB RAM. Each document has a text field that can be relatively large (it can be a whole blog post) and the other fields are data about the text content and the text author. Here's what the schema looks like:
{
text: "Last night there was a storm in San Francisco...",
author: {
name: "Firstname Lastname",
website_url: "http://..."
},
date: "201403075612",
language: "en",
concepts: [
{name: "WeatherConcept", hit: "storm", start: 23, stop: 28},
{name: "LocationConcept", hit: "San Francisco", start: 32, stop: 45}
],
location: "us",
coordinates: []
}
I'm planning to query the data in different ways:
Full-text search on the "text" field. So let's say my text search query is q:
db.coll.aggregate([ { $match:{ $text: { $search:q } } } ])
Aggregate documents by author:
db.coll.aggregate([ { $project: { name: "$author.name", url: "$author.website_url" } }, { $group: { _id: "$name", size: { $sum:1 }, url: { $first: "$url" } } }, { $sort:{ size:-1 } } ])
Aggregate documents by concepts:
db.coll.aggregate([ { $unwind: "$concepts" }, { $group: { _id: "$concepts.name", size: { $sum:1 } } }, { $sort:{ size:-1 } } ])
These three queries may also include filtering on the following fields: date, location, coordinates, language, author.
I don't have indexes yet in place, so the queries run very slow. But since the indexes would be very different for the different ways I hit the data, does that rule out indexing as a solution? Or is there a way to index for all these cases and not have to shard the collection? Basically my questions are:
Do you have any indexes on your collection?
Have a look at the following
http://docs.mongodb.org/manual/indexes/
if you do have indexes make sure they are being hit by doing the following
db.CollectionName.find({"Concept":"something"}).explain();
You also need to give us more information about your setup. How much RAM does the server have? I've worked with a MongoDB that has 200GB sitting on 3 shards. So 13GB on 1 shouldn't be an issue
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.