简体   繁体   English

Mongo DB查询子集

[英]Mongo db query subset

I currently have a MongoDB setup with a fairly large database (about 250m documents). 我目前有一个具有相当大的数据库(约2.5亿个文档)的MongoDB安装程序。 At present, I have one main collection that has the majority of the data, which has a single index (time). 目前,我有一个拥有大部分数据的主集合,该集合具有单个索引(时间)。 This results in acceptable query times when only the time is in the where part of the query (the index is used). 仅当时间位于查询的位置(使用索引)时,这才可以接受可接受的查询时间。

The problem is when I need to use a compound key - the time index uses about 2.5GB of memory, and I only have 4GB on the server, so I don't want to create a compound key index since that will prevent all indexes from fitting in memory and thus slow things down a lot. 问题是当我需要使用复合键时-时间索引使用大约2.5GB的内存,而服务器上只有4GB,因此我不想创建复合键索引,因为这将阻止所有索引适应内存,从而使速度大大降低。

So my question is this: can I query first for time, and then query that subset for the other variables? 所以我的问题是:我可以先查询时间,然后再查询该子集的其他变量吗?

I should point out that I am using the Ruby driver. 我应该指出,我正在使用Ruby驱动程序。

At the moment, my query looks like this (this is very slow): 此刻,我的查询看起来像这样(这很慢):

trade_stop_loss_time = ticks.find_one({
        "time" => { "$gt" => trade_time_open, "$lte" => trade_time_close },
        "bid"  => { "$lte" => stop_loss_price } 
    }).sort({"time" => 1})

Thanks! 谢谢!

If you simply perform the query you present, the database should be smart enough to do exactly that. 如果仅执行所提供的查询,则数据库应该足够聪明才能做到这一点。

The query you have should basically filter down the candidate set using the time index, then scan the remaining objects for the bid parameter. 您所拥有的查询基本上应该使用time索引过滤掉候选集,然后扫描其余对象的bid参数。 This should be a lot more efficient than doing the scan on the client. 这比在客户端上进行扫描要有效得多。

You should definitely run explain() on your query to find out what it's doing. 您绝对应该在查询上运行explain()以了解其作用。 If it uses an index ( BtreeCursor ) and the number of scanned objects is just the number of items in the given time frame, it's doing fine. 如果它使用索引( BtreeCursor ),并且扫描的对象数仅是给定时间范围内的项目数,那就很好。 I don't think there's a better way than that, given your constraints. 考虑到您的限制,我认为没有比这更好的方法了。 Doing the same operation on the client will definitely be slower. 在客户端上执行相同的操作肯定会比较慢。

Of course, a limit and a small time frame will help to make your query faster, but these might be external factors. 当然,一个limit和一个较短的时间范围将有助于使您的查询更快,但是这些可能是外部因素。 mongostat might also help to find the problem. mongostat也可能有助于发现问题。

However, if your documents and/or time spans are large, it might still be better to add the compound index: loading a lot of large documents from disk (since your RAM is already full) will take some time. 但是,如果您的文档和/或时间跨度较大,则最好添加复合索引:从磁盘加载许多大文档(因为RAM已满)会花费一些时间。 Paging the index from disk is also slow, but it's much less data. 从磁盘分页索引也很慢,但是数据少得多。

A good answer can be given only be experiment. 一个好的答案只能是实验。

You could return the results using just the time index then filter them further client side? 您可以只使用时间索引返回结果,然后进一步过滤客户端? Other than that I think you're pretty much out of luck. 除此之外,我认为您很不走运。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM