简体   繁体   English

MongoDB大型索引构建非常慢

[英]MongoDB large index build very slow

I have a collection with 400 million documents. 我有一个包含4亿份文件的藏品。 Each has 6 DateTime, 1 Boolean, 8 Double, 9 Integer, and 6 String fields. 每个都有6个DateTime,1个Boolean,8个Double,9个Integer和6个String字段。 I am trying to build the following index: 我正在尝试构建以下索引:

db.MyCollection.ensureIndex( 
    { "String1" : 1, "String2" : 1, "String3" : 1, "DateTime1" : 1, "Integer1" : 1, "DateTime2" : 1 }, 
    {background: true} 
);

After running for 5 days it is only half done. 运行5天后,只完成了一半。

The server is running Windows Server Enterprise and has 4TB disk space and 256GB RAM. 该服务器运行的是Windows Server Enterprise,具有4TB磁盘空间和256GB RAM。 Very few other processes are running against the database. 很少有其他进程针对数据库运行。 No sharding or other special configuration. 没有分片或其他特殊配置。

Is there any way to speed this up? 有什么方法可以加快速度吗? (Without dropping the background = true qualifier, because I don't want it to completely shut me out of the database, which it does in that case.) (不要删除background = true限定符,因为我不希望它完全关闭我的数据库,在这种情况下就是这样。)

Misconceptions 误区

Speed 速度

Even when not talking of a multi key index, here is what happens. 即使不谈多键索引,也会发生这种情况。 There is a massive table scan going on. 正在进行大规模的表扫描。 So mongoDB iterates over the documents, tries to find the field to be indexed, evaluates that field (to null if it does not exist in the current document) and writes it's findings to no less than 6 files as we are talking of 6 indices. 因此,mongoDB遍历文档,尝试查找要编制索引的字段,评估该字段(如果当前文档中不存在则为null )并将其结果写入不少于6个文件,因为我们正在讨论6个索引。 Doing the math: 200.000.000 / 86400 * 5 tells us that mongoDB does this for roughly 460 documents per second or only needs 2.2 milliseconds per document . 做数学:200.000.000 / 86400 * 5告诉我们mongoDB 每秒大约460个文档每个文档只需要2.2毫秒 I would not call that slow. 我不会那么慢。 It may take long, but it is not slow. 这可能需要很长时间,但并不慢。

{background:true}

Using this parameter does not lock you out of the databases. 使用此参数锁定你出的数据库。 Quite the contrary, which is clearly stated in the docs, both on the Index Creation section and in the tutorial section on creating indices in the background . 恰恰相反,在文档中明确说明了索引创建部分后台创建索引教程部分 However, there is a sentence which can easily be misinterpreted: 但是,有一句话很容易被误解:

Also, no operation that requires a read or write lock on all databases (eg listDatabases) can occur during a foreground index build. 此外,在前台索引构建期间,不会对所有数据库(例如listDatabases)执行需要读取或写入锁定的操作。

What that means is that you can not do operations which apply to all databases and require a read or write lock. 这意味着您无法执行适用于所有数据库需要读取或写入锁定的操作。

Ways to improve (in the future) 改进的方法(将来)

Sharded Cluster 分片群集

Use a shared cluster with replica set shards. 使用具有副本集分片的共享群集。 It is easy to set up and has multiple advantages besides improved performance. 除了提高性能外,它易于设置并具有多种优势。 One of them is easy scalability adding a shard (and thus adding space and computing power to a cluster) is very easy. 其中之一是易于扩展,添加分片(从而为群集增加空间和计算能力) 非常容易。 Backups have less impact on the application. 备份对应用程序的影响较小。 There is not single point of failure any more (when done right, this even applies to outages at the scale of a whole datacenter). 不再存在单点故障(如果做得对,这甚至适用于整个数据中心规模的中断)。

Use a different filesystem 使用不同的文件系统

Sorry, running a disk io performance dependent application on a Windows Server does not make sense to me - at all. 抱歉,在Windows服务器上运行依赖于性能的磁盘应用程序对我来说没有意义 - 完全没有意义。 ExtFS4 or XFS are between 25% and 40% faster than NTFS or ReFS, depending on the optimization. ExtFS4或XFS比NTFS或ReFS快25%到40%,具体取决于优化。 This makes a real difference on applications which are as disk IO dependent like your use case. 这对像您的用例一样依赖磁盘IO的应用程序产生了真正的影响 We are talking of a matter of days (not even taking into account the more efficient memory mapping and the reduced memory consumption of the OS on Linux systems). 我们谈论的是几天(甚至没有考虑更高效的内存映射和Linux系统上操作系统的内存消耗减少)。

{background:true}

While this does not really improve performance (actually building indices in the background take longer than in foreground for obvious reasons), your application stays available during the time during which the index is build. 虽然这并没有真正提高性能(实际上,由于显而易见的原因,实际构建索引的时间比前台要长),但您的应用程序在构建索引期间仍然可用。 So depending on your needs, this may be a viable option. 因此,根据您的需求,这可能是一个可行的选择。

Side note : It is a Bad Idea™ , to scale vertically when using mongoDB since it was explicitly designed to be scaled horizontally. 附注 :使用mongoDB时,垂直缩放是一个坏主意™ ,因为它明确设计为水平缩放。 This especially applies for large collections like yours as parallel processing would greatly improve the performance of your application. 这尤其适用于像您这样的大型集合,因为并行处理会大大提高应用程序的性能。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM