简体   繁体   English

MongoDB - 查询超过1000万条记录的性能

[英]MongoDB - Querying performance for over 10 million records

First of all: I already read a lot of post according to MongoDB query performance, but I didn't find any good solution. 首先:我已经根据MongoDB查询性能阅读了很多帖子,但我没有找到任何好的解决方案。

Inside the collection, the document structure looks like: 在集合内部,文档结构如下所示:

{
    "_id" : ObjectId("535c4f1984af556ae798d629"),
    "point" : [
        -4.372925494081455,
        41.367710205649544
    ],
    "location" : [
        {
            "x" : -7.87297955453618,
            "y" : 73.3680160842939
        },
        {
            "x" : -5.87287143362673,
            "y" : 73.3674043270052
        }
    ],
    "timestamp" : NumberLong("1781389600000")
}

My collection already has an index: 我的收藏已经有一个索引:

db.collection.ensureIndex({timestamp:-1})

Query looks like: 查询看起来像:

db.collection.find({ "timestamp" : { "$gte" : 1380520800000 , "$lte" : 1380546000000}})

Despite of this, the response time is too high, about 20 - 30 seconds (this time depends on the specified query params) 尽管如此,响应时间太长,大约20-30秒(这个时间取决于指定的查询参数)

Any help is useful! 任何帮助都很有用!

Thanks in advance. 提前致谢。

EDIT: I changed the find params, replacing these by real data. 编辑:我更改了查找参数,用实际数据替换了这些参数。

The above query takes 46 seconds, and this is the information given by explain() function: 上面的查询需要46秒,这是explain()函数给出的信息:

{
    "cursor" : "BtreeCursor timestamp_1",
    "isMultiKey" : false,
    "n" : 124494,
    "nscannedObjects" : 124494,
    "nscanned" : 124494,
    "nscannedObjectsAllPlans" : 124494,
    "nscannedAllPlans" : 124494,
    "scanAndOrder" : false,
    "indexOnly" : false,
    "nYields" : 45,
    "nChunkSkips" : 0,
    "millis" : 46338,
    "indexBounds" : {
        "timestamp" : [
            [
                1380520800000,
                1380558200000
            ]
        ]
    },
    "server" : "ip-XXXXXXXX:27017"
}

The explain-output couldn't be more ideal. 解释输出不可能更理想。 You found 124,494 documents via index ( nscanned ) and they all were valid results, so they all were returned ( n ). 你可以通过索引中找到124494个文件( nscanned )和他们都是有效的结果,所以他们都被退回( n )。 It still wasn't an index-only query, because the bounds weren't exact values found in specific documents. 它仍然不是仅索引查询,因为边界不是特定文档中找到的确切值。

The reason why this query is a bit slow could be the huge amount of data it returned. 这个查询有点慢的原因可能是它返回的大量数据。 All the documents you found must be read from hard-drive (when the collection is cold), scanned, serialized, sent to the client via network and deserialized by the client. 您找到的所有文档必须从硬盘驱动器(当集合冷却时)读取,扫描,序列化,通过网络发送到客户端并由客户端反序列化。

Do you really need that much data for your use-case? 你真的需要那么多的数据用于你的用例吗? When the answer is yes, does responsiveness really matter? 如果答案是肯定的,那么回应真的重要吗? I do not know what kind of application you actually want to create, but I am wildly guessing that yours is one of three use-cases: 我不知道你真正想要创建什么样的应用程序,但我疯狂地猜测你的应用程序是三个用例之一:

  1. You want to show all that data in form of some kind of report. 您希望以某种报告的形式显示所有数据。 That would mean the output would be a huge list the user has to scroll through. 这意味着输出将是用户必须滚动的巨大列表。 In that case I would recommend to use pagination. 在这种情况下,我会建议使用分页。 Only load as much data as fits on one screen and provide next and previous buttons. 只在一个屏幕上加载适合的数据并提供nextprevious按钮。 MongoDB pagination can be done with the cursor methods .limit(n) and .skip(n) . 可以使用游标方法.limit(n).skip(n)完成MongoDB分页。
  2. The above, but it is some kind of offline-report the user can download and then examine with all kinds of data-mining tools. 以上,但它是某种离线报告,用户可以下载,然后检查各种数据挖掘工具。 In that case the initial load-time would be acceptable, because the user will spend some time with the data they received. 在这种情况下,初始加载时间是可以接受的,因为用户将花费一些时间处理他们收到的数据。
  3. You don't want to show all of that raw-data to the user but process it and present it in some kind of aggregated way, like a statistic or a diagram. 您不希望向用户显示所有原始数据,而是处理它并以某种聚合方式呈现它,如统计信息或图表。 In that case you could likely do all that work already on the database with the aggregation framework. 在这种情况下,您可能会使用聚合框架在数据库上完成所有工作。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM