[英]Mongodb: big data structure
I'm rebuilding my website which is a search engine for nicknames from the most active forum in France: you search for a nickname and you got all of its messages. 我正在重建我的网站,该网站是法国最活跃的论坛中昵称的搜索引擎:您搜索一个昵称,并获得其所有消息。
My current database contains more than 60Gb of data, stored in a MySQL database. 我当前的数据库包含超过60Gb的数据,存储在MySQL数据库中。 I'm now rewriting it into a mongodb database, and after retrieving 1 million messages (1 message = 1 document) find() started to take a while.
我现在将其重写到mongodb数据库中,并且在检索到100万条消息(1条消息= 1个文档)之后,find()开始花费一段时间。
The structure of a document is as such: 文件的结构如下:
{
"_id" : ObjectId(),
"message": "<p>Hai guys</p>",
"pseudo" : "mahnickname", //from a nickname (*pseudo* in my db)
"ancre" : "774497928", //its id in the forum
"datepost" : "30/11/2015 20:57:44"
}
I set the id ancre as unique, so I don't get twice the same entry. 我将id 英亩设置为唯一,因此不会获得相同条目的两倍。
Then the user enters the nickname and it finds all documents that have that nickname. 然后,用户输入昵称,并找到具有该昵称的所有文档。
Here is the request: 这是请求:
Model.find({pseudo: "danickname"}).sort('-datepost').skip((r_page -1) * 20).limit(20).exec(function(err, bears)...
Should I structure it differently? 我应该改变结构吗? Instead of having one document for each message, I'm having a document for each nickname and I update the document once I get a new message from that nickname?
我没有为每个消息提供一个文档,而是为每个昵称都有一个文档,一旦从该昵称收到新消息,我便会更新该文档?
I was using the first method with MySQL et it wasn't taking that long. 我在MySQL中使用第一种方法,并没有花那么长时间。
Edit: Or maybe should I just index the nicknames ( pseudo )? 编辑:或者也许我应该只索引昵称( 伪 )?
Thanks! 谢谢!
Here are some recommendations for your problem about big data: 以下是针对您的大数据问题的一些建议:
datepost
field. datepost
字段来节省一些磁盘空间。 ancre
field? ancre
田地吗? The ObjectId is already unique and indexed. datepost
seperate too, you could replace the _id
field to be your ancre
field. datepost
分开,则可以将_id
字段替换为您的ancre
字段。 pseudo
. pseudo
上添加一个索引。 This will make the "get all messages where the pseudo is mahnickname" search much faster. db.collection.stats()
and looking at the indexSizes
sub-document. db.collection.stats()
并查看indexSizes
子文档来查看索引字段的RAM消耗。 datepost
field or the timestamp in _id
for your paging strategy. datepost
字段或_id
的时间戳。 If you decide on using the datepost
, make a compound index on pseudo
and datepost
. datepost
,请在pseudo
和datepost
上建立复合索引 。 As for your benchmarks, you can closely monitor MongoDB by using mongotop and mongostat . 至于基准,您可以使用mongotop和mongostat密切监视MongoDB。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.