查询在大型MongoDB数据库上运行非常慢

Question

I have a MongoDB db with a single rather large collection of documents (13GB for about 2M documents) sitting on a single server with 8GB RAM. 我有一个MongoDB数据库，它具有一个相当大的文档集合（13GB约200万个文档）坐在具有8GB RAM的单个服务器上。 Each document has a text field that can be relatively large (it can be a whole blog post) and the other fields are data about the text content and the text author. 每个文档都有一个相对较大的文本字段（可以是整个博客文章），其他字段是有关文本内容和文本作者的数据。 Here's what the schema looks like: 架构如下所示：

{
    text: "Last night there was a storm in San Francisco...",
    author: {
        name: "Firstname Lastname",
        website_url: "http://..."
    },
    date: "201403075612",
    language: "en",
    concepts: [
        {name: "WeatherConcept", hit: "storm", start: 23, stop: 28},
        {name: "LocationConcept", hit: "San Francisco", start: 32, stop: 45}
    ],
    location: "us",
    coordinates: []
}

I'm planning to query the data in different ways: 我打算以不同的方式查询数据：

Full-text search on the "text" field. 在“文本”字段上进行全文搜索。 So let's say my text search query is q: 假设我的文字搜寻查询是q：
```
 db.coll.aggregate([ { $match:{ $text: { $search:q } } } ]) 
```

Aggregate documents by author: 按作者汇总文件：

 db.coll.aggregate([ { $project: { name: "$author.name", url: "$author.website_url" } }, { $group: { _id: "$name", size: { $sum:1 }, url: { $first: "$url" } } }, { $sort:{ size:-1 } } ])

Aggregate documents by concepts: 按概念汇总文档：

 db.coll.aggregate([ { $unwind: "$concepts" }, { $group: { _id: "$concepts.name", size: { $sum:1 } } }, { $sort:{ size:-1 } } ])

These three queries may also include filtering on the following fields: date, location, coordinates, language, author. 这三个查询还可能包括对以下字段的过滤：日期，位置，坐标，语言，作者。

I don't have indexes yet in place, so the queries run very slow. 我还没有索引，所以查询运行非常慢。 But since the indexes would be very different for the different ways I hit the data, does that rule out indexing as a solution? 但是由于索引对我访问数据的不同方式会有很大不同，这是否排除索引作为解决方案？ Or is there a way to index for all these cases and not have to shard the collection? 还是有一种方法可以为所有这些情况建立索引，而不必将集合分片？ Basically my questions are: 基本上我的问题是：

What would be a good indexing strategy in this case? 在这种情况下，什么是好的索引策略？
Do I need to create separate collections for authors and concepts? 我需要为作者和概念创建单独的集合吗？
Should I somehow restructure my data? 我应该以某种方式重组我的数据吗？
Do I need to shard my collection or is my 8GB single-server powerful enough to handle that data? 我需要分片收集还是我的8GB单服务器功能强大到足以处理这些数据？

Answer 1

Do you have any indexes on your collection? 您的收藏夹上有索引吗？

Have a look at the following 看看以下

http://docs.mongodb.org/manual/indexes/ http://docs.mongodb.org/manual/indexes/

if you do have indexes make sure they are being hit by doing the following 如果您有索引，请执行以下操作以确保它们被命中

db.CollectionName.find({"Concept":"something"}).explain();

You also need to give us more information about your setup. 您还需要向我们提供有关您的设置的更多信息。 How much RAM does the server have? 服务器有多少RAM？ I've worked with a MongoDB that has 200GB sitting on 3 shards. 我曾使用过一个MongoDB，该数据库具有200GB的3个分区。 So 13GB on 1 shouldn't be an issue 因此1上的13GB应该不是问题

查询在大型MongoDB数据库上运行非常慢

问题描述

1 个解决方案

解决方案1
0 2014-04-17 13:30:48

查询在大型MongoDB数据库上运行非常慢

问题描述

1 个解决方案

解决方案1 0 2014-04-17 13:30:48

解决方案1
0 2014-04-17 13:30:48