简体   繁体   中英

Query running very slow on big MongoDB db

I have a MongoDB db with a single rather large collection of documents (13GB for about 2M documents) sitting on a single server with 8GB RAM. Each document has a text field that can be relatively large (it can be a whole blog post) and the other fields are data about the text content and the text author. Here's what the schema looks like:

{
    text: "Last night there was a storm in San Francisco...",
    author: {
        name: "Firstname Lastname",
        website_url: "http://..."
    },
    date: "201403075612",
    language: "en",
    concepts: [
        {name: "WeatherConcept", hit: "storm", start: 23, stop: 28},
        {name: "LocationConcept", hit: "San Francisco", start: 32, stop: 45}
    ],
    location: "us",
    coordinates: []
}

I'm planning to query the data in different ways:

  1. Full-text search on the "text" field. So let's say my text search query is q:

     db.coll.aggregate([ { $match:{ $text: { $search:q } } } ]) 
  2. Aggregate documents by author:

     db.coll.aggregate([ { $project: { name: "$author.name", url: "$author.website_url" } }, { $group: { _id: "$name", size: { $sum:1 }, url: { $first: "$url" } } }, { $sort:{ size:-1 } } ]) 
  3. Aggregate documents by concepts:

     db.coll.aggregate([ { $unwind: "$concepts" }, { $group: { _id: "$concepts.name", size: { $sum:1 } } }, { $sort:{ size:-1 } } ]) 

These three queries may also include filtering on the following fields: date, location, coordinates, language, author.

I don't have indexes yet in place, so the queries run very slow. But since the indexes would be very different for the different ways I hit the data, does that rule out indexing as a solution? Or is there a way to index for all these cases and not have to shard the collection? Basically my questions are:

  • What would be a good indexing strategy in this case?
  • Do I need to create separate collections for authors and concepts?
  • Should I somehow restructure my data?
  • Do I need to shard my collection or is my 8GB single-server powerful enough to handle that data?

Do you have any indexes on your collection?

Have a look at the following

http://docs.mongodb.org/manual/indexes/

if you do have indexes make sure they are being hit by doing the following

db.CollectionName.find({"Concept":"something"}).explain();

You also need to give us more information about your setup. How much RAM does the server have? I've worked with a MongoDB that has 200GB sitting on 3 shards. So 13GB on 1 shouldn't be an issue

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM