简体   繁体   English

mongodb:limit() 会提高查询速度吗?

[英]mongodb: will limit() increase query speed?

Is db.inventory.find().limit(10) faster than db.inventory.find() ? db.inventory.find().limit(10)db.inventory.find()快吗?

I have millions of records in mongodb, I want to get top 10 records in some orders.我在mongodb中有数百万条记录,我想在某些订单中获得前10条记录。

Using limit() you inform the server that you will not retrieve more than k documents.使用limit()通知服务器您将不会检索超过k 个文档。 Allowing some optimizations to reduce bandwidth consumption and to speed-up sorts.允许进行一些优化以减少带宽消耗并加速排序。 Finally, using a limit clause the server will be able to better use the 32MB max available when sorting in RAM (ie: when sort order cannot be obtained from an index).最后,使用限制子句,服务器将能够在 RAM 中排序时更好地使用 32MB 最大可用空间(即:无法从索引中获取排序顺序时)。


Now, the long story: find() returns a cursor.现在,长话短说: find()返回一个游标。 By default, the cursor will transfer the results to the client in batches.默认情况下,游标会将结果批量传输到客户端。 From the documentation ,:文档中,:

For most queries, the first batch returns 101 documents or just enough documents to exceed 1 megabyte.对于大多数查询,第一批返回 101 个文档或刚好足以超过 1 兆字节的文档。 Subsequent batch size is 4 megabytes.后续批次大小为 4 兆字节。

Using limit() the cursor will not need to retrieve more documents than necessary.使用limit()游标将不需要检索比必要更多的文档。 Thus reducing bandwidth consumption and latency.从而减少带宽消耗和延迟。

Please notice that, given your use case, you will probably use a sort() operation as well.请注意,鉴于您的用例,您可能还会使用sort()操作。 From the same documentation as above:来自与上述相同的文档:

For queries that include a sort operation without an index, the server must load all the documents in memory to perform the sort before returning any results.对于包含没有索引的排序操作的查询,服务器必须在内存中加载所有文档以在返回任何结果之前执行排序。

And the sort() documentation page explains further: sort() 文档页面进一步解释了:

If MongoDB cannot obtain the sort order via an index scan, then MongoDB uses a top-k sort algorithm.如果 MongoDB 无法通过索引扫描获得排序顺序,则 MongoDB 使用 top-k 排序算法。 This algorithm buffers the first k results (or last, depending on the sort order) seen so far by the underlying index or collection access.该算法缓冲目前由底层索引或集合访问看到的前 k 个结果(或最后一个,取决于排序顺序)。 If at any point the memory footprint of these k results exceeds 32 megabytes, the query will fail 1 .如果在任何时候这 k 个结果的内存占用超过 32 兆字节,则查询将失败1


1 That 32 MB limitation is not specific to sort using a limit() clause. 1 32 MB 的限制并非特定于使用limit()子句进行排序。 Any sort whose order cannot be obtained from an index will suffer from the same limitation.无法从索引中获取顺序的任何排序都将受到相同的限制。 However, with a plain sort the server need to hold all documents in its memory to sort them.但是,对于普通排序,服务器需要将所有文档保存在其内存中以对其进行排序。 With a limited sort, it only have to store k documents in memory at the same time.有限的排序中,它只需要同时在内存中存储k 个文档。

if you need it in order then of course the DB would first sort it based on the criteria and then return the top 10 records.如果您按顺序需要它,那么数据库当然会首先根据条件对其进行排序,然后返回前 10 条记录。 by using the limit you are just saving the network bandwidth.通过使用限制,您只是在节省网络带宽。 eg here I am sorting by name and then giving the top 10 records, it has to scan the whole data and then pick the top 10. (as you can notice its doing COLLSCAN which is understood for collection scan as I don't have the index for this example, the idea to show here is that its doing the full scan of all the records, sort it and then pick the top ones.)例如,这里我按名称排序,然后给出前 10 条记录,它必须扫描整个数据,然后选择前 10 条。(你可以注意到它在执行 COLLSCAN,这被理解为集合扫描,因为我没有这个例子的索引,这里展示的想法是它对所有记录进行全面扫描,对其进行排序,然后选择顶部的记录。)

> db.t1.find().sort({name:1}).limit(10).explain()
{
    "queryPlanner" : {
            "plannerVersion" : 1,
            "namespace" : "test.t1",
            "indexFilterSet" : false,
            "parsedQuery" : {
                    "$and" : [ ]
            },
            "winningPlan" : {
                    "stage" : "SORT",
                    "sortPattern" : {
                            "name" : 1
                    },
                    "limitAmount" : 10,
                    "inputStage" : {
                            "stage" : "COLLSCAN",
                            "filter" : {
                                    "$and" : [ ]
                            },
                            "direction" : "forward"
                    }
            },
            "rejectedPlans" : [ ]
    },
    "serverInfo" : {
            "host" : "Sachin-Mac.local",
            "port" : 27017,
            "version" : "3.0.2",
            "gitVersion" : "6201872043ecbbc0a4cc169b5482dcf385fc464f"
    },
    "ok" : 1
}

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM