简体   繁体   English

5M和150M网页的Mongo数据库真的很慢

[英]Mongo dbs of 5M and 150M webpages is really really slow

I'm storing a collection of webpages in a mongodb, around 150M webpages. 我在mongodb中存储了大约1.5亿个网页。 Each page of different size. 每页大小不同。 The only transaction I want to do it to retrieve pages using their id (not mongodb default _id). 我要执行的唯一事务是使用其ID(而不是mongodb默认_id)检索页面。 However, it takes really too long time to get results and I didn't manage to retrieve any document yet. 但是,获取结果的时间确实太长,而且我还没有设法检索任何文档。 However, use db.collection.findOne() works perfectly. 但是,使用db.collection.findOne()可以完美地工作。 Hence, I indexed a subset of 5M webpages for testing and repairing. 因此,我索引了5M网页的一部分以进行测试和修复。 When issue a query against this db db.collection.find("id":"aw-000") , it takes 4 minutes or more to get a document. 当对此数据库发出查询时, db.collection.find("id":"aw-000")会花费4分钟或更长时间来获取文档。

I tried db.runCommand({compact: 'collection'}) and db.runCommand({compact: 'collection'}) m but they didn't help! 我尝试了db.runCommand({compact: 'collection'})db.runCommand({compact: 'collection'}) m,但它们没有帮助!

When I checked the logs under var/log/mongodb/mongod.log (that should contains any query took more than 100ms), I found this: 当我检查var/log/mongodb/mongod.log下的var/log/mongodb/mongod.log (应该包含所有查询花费的时间超过100毫秒),我发现了这一点:

655163:2017-07-16T14:05:37.231+0300 I COMMAND  [ftdc] serverStatus was very slow: { after basic: 0, after asserts: 0, after connections: 0, after extra_info: 310, after globalLock: 310, after locks: 310, after network: 310, after opcounters: 310, after opcountersRepl: 310, after storageEngine: 310, after tcmalloc: 310, after wiredTiger: 310, at end: 1220 }

However, I don't know how to benefit from such logs. 但是,我不知道如何从此类日志中受益。

Is there a way to make my db more efficient? 有没有办法使我的数据库更有效率?

As pointed by Neil Lunn in the comments above. 正如尼尔·伦恩(Neil Lunn)在上述评论中指出的那样。 I found the easiest solution is to create the db from scratch, while using _id as my id field name instead of "id" . 我发现最简单的解决方案是从头开始创建数据库,同时使用_id作为我的id字段名而不是"id" _id has an index by default and the only type of queries will be issued against this index is retrieving by id. _id默认情况下具有索引,并且针对该索引将发出的唯一查询类型是通过id检索。

So, the program (any program that's used to create the index) will insert the object as following: 因此,该程序(用于创建索引的任何程序)将按如下所示插入对象:

db.collection.insert( { _id: "aw-000", page: "...", .... } )

instead of: 代替:

db.collection.insert( { id: "aw-000", page: "...", .... } )

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM