[英]Mongo dbs of 5M and 150M webpages is really really slow
I'm storing a collection of webpages in a mongodb, around 150M webpages. 我在mongodb中存储了大约1.5亿个网页。 Each page of different size. 每页大小不同。 The only transaction I want to do it to retrieve pages using their id (not mongodb default _id). 我要执行的唯一事务是使用其ID(而不是mongodb默认_id)检索页面。 However, it takes really too long time to get results and I didn't manage to retrieve any document yet. 但是,获取结果的时间确实太长,而且我还没有设法检索任何文档。 However, use db.collection.findOne()
works perfectly. 但是,使用db.collection.findOne()
可以完美地工作。 Hence, I indexed a subset of 5M webpages for testing and repairing. 因此,我索引了5M网页的一部分以进行测试和修复。 When issue a query against this db db.collection.find("id":"aw-000")
, it takes 4 minutes or more to get a document. 当对此数据库发出查询时, db.collection.find("id":"aw-000")
会花费4分钟或更长时间来获取文档。
I tried db.runCommand({compact: 'collection'})
and db.runCommand({compact: 'collection'})
m but they didn't help! 我尝试了db.runCommand({compact: 'collection'})
和db.runCommand({compact: 'collection'})
m,但它们没有帮助!
When I checked the logs under var/log/mongodb/mongod.log
(that should contains any query took more than 100ms), I found this: 当我检查var/log/mongodb/mongod.log
下的var/log/mongodb/mongod.log
(应该包含所有查询花费的时间超过100毫秒),我发现了这一点:
655163:2017-07-16T14:05:37.231+0300 I COMMAND [ftdc] serverStatus was very slow: { after basic: 0, after asserts: 0, after connections: 0, after extra_info: 310, after globalLock: 310, after locks: 310, after network: 310, after opcounters: 310, after opcountersRepl: 310, after storageEngine: 310, after tcmalloc: 310, after wiredTiger: 310, at end: 1220 }
However, I don't know how to benefit from such logs. 但是,我不知道如何从此类日志中受益。
Is there a way to make my db more efficient? 有没有办法使我的数据库更有效率?
As pointed by Neil Lunn in the comments above. 正如尼尔·伦恩(Neil Lunn)在上述评论中指出的那样。 I found the easiest solution is to create the db from scratch, while using _id
as my id field name instead of "id"
. 我发现最简单的解决方案是从头开始创建数据库,同时使用_id
作为我的id字段名而不是"id"
。 _id
has an index by default and the only type of queries will be issued against this index is retrieving by id. _id
默认情况下具有索引,并且针对该索引将发出的唯一查询类型是通过id检索。
So, the program (any program that's used to create the index) will insert the object as following: 因此,该程序(用于创建索引的任何程序)将按如下所示插入对象:
db.collection.insert( { _id: "aw-000", page: "...", .... } )
instead of: 代替:
db.collection.insert( { id: "aw-000", page: "...", .... } )
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.