5M和150M网页的Mongo数据库真的很慢

Question

I'm storing a collection of webpages in a mongodb, around 150M webpages. 我在mongodb中存储了大约1.5亿个网页。 Each page of different size. 每页大小不同。 The only transaction I want to do it to retrieve pages using their id (not mongodb default _id). 我要执行的唯一事务是使用其ID（而不是mongodb默认_id）检索页面。 However, it takes really too long time to get results and I didn't manage to retrieve any document yet. 但是，获取结果的时间确实太长，而且我还没有设法检索任何文档。 However, use db.collection.findOne() works perfectly. 但是，使用db.collection.findOne()可以完美地工作。 Hence, I indexed a subset of 5M webpages for testing and repairing. 因此，我索引了5M网页的一部分以进行测试和修复。 When issue a query against this db db.collection.find("id":"aw-000") , it takes 4 minutes or more to get a document. 当对此数据库发出查询时， db.collection.find("id":"aw-000")会花费4分钟或更长时间来获取文档。

I tried db.runCommand({compact: 'collection'}) and db.runCommand({compact: 'collection'}) m but they didn't help! 我尝试了db.runCommand({compact: 'collection'})和db.runCommand({compact: 'collection'}) m，但它们没有帮助！

When I checked the logs under var/log/mongodb/mongod.log (that should contains any query took more than 100ms), I found this: 当我检查var/log/mongodb/mongod.log下的var/log/mongodb/mongod.log （应该包含所有查询花费的时间超过100毫秒），我发现了这一点：

655163:2017-07-16T14:05:37.231+0300 I COMMAND  [ftdc] serverStatus was very slow: { after basic: 0, after asserts: 0, after connections: 0, after extra_info: 310, after globalLock: 310, after locks: 310, after network: 310, after opcounters: 310, after opcountersRepl: 310, after storageEngine: 310, after tcmalloc: 310, after wiredTiger: 310, at end: 1220 }

However, I don't know how to benefit from such logs. 但是，我不知道如何从此类日志中受益。

Is there a way to make my db more efficient? 有没有办法使我的数据库更有效率？

Answer 1

As pointed by Neil Lunn in the comments above. 正如尼尔·伦恩（Neil Lunn）在上述评论中指出的那样。 I found the easiest solution is to create the db from scratch, while using _id as my id field name instead of "id" . 我发现最简单的解决方案是从头开始创建数据库，同时使用_id作为我的id字段名而不是"id" 。 _id has an index by default and the only type of queries will be issued against this index is retrieving by id. _id默认情况下具有索引，并且针对该索引将发出的唯一查询类型是通过id检索。

So, the program (any program that's used to create the index) will insert the object as following: 因此，该程序（用于创建索引的任何程序）将按如下所示插入对象：

db.collection.insert( { _id: "aw-000", page: "...", .... } )

instead of: 代替：

db.collection.insert( { id: "aw-000", page: "...", .... } )

5M和150M网页的Mongo数据库真的很慢

问题描述

1 个解决方案

解决方案1
0 已采纳 2017-07-19 08:56:20

5M和150M网页的Mongo数据库真的很慢

问题描述

1 个解决方案

解决方案1 0 已采纳 2017-07-19 08:56:20

解决方案1
0 已采纳 2017-07-19 08:56:20