简体   繁体   English

快速将整个MongoDB集合索引到Elastcticsearch

[英]Indexing an entire MongoDB collection into Elastcticsearch quickly

I have a collection in MongoDB which I am indexing into Elasticsearch. 我在MongoDB中有一个要索引到Elasticsearch的集合。 I am doing this in a C# process. 我正在C#流程中执行此操作。 The collection has 100 million documents, and for each document, I have to query other documents in order to denormalise into the Elasticsearch index. 该集合有1亿个文档,对于每个文档,我都必须查询其他文档,以便将规范化为Elasticsearch索引。

This all takes time. 这都需要时间。 Reading from MongoDB is the slow part (indexing is relatively quick). 从MongoDB读取是比较慢的部分(索引相对较快)。 I am batching the data from MongoDB as efficiently as I can but the process takes over 2 days. 我正在尽可能高效地从MongoDB批处理数据,但是此过程需要2天以上的时间。

This only has to happen when the mapping in Elasticsearch changes, but that has happened a couple of times over the last month. 仅在Elasticsearch中的映射发生更改时才发生这种情况,但是在上个月发生了两次。 Are there any ways of improving the performance for this? 有什么方法可以提高性能吗?

Maybe you don't need launch import from scratch (I mean import from MongoDB), when you change mappings. 更改映射时,也许您不需要从头启动导入(我的意思是从MongoDB导入)。 Read this: Elasticsearch Reindex API 阅读本文: Elasticsearch Reindex API

When you need to change mapping you must: 当您需要更改映射时,必须:

  1. Create new index with new mapping 使用新映射创建新索引
  2. Reindex data from the old index into a new index using the built-in feature of elasticsearch. 使用elasticsearch的内置功能将数据从旧索引重新索引到新索引。

After this old documents will be indexed with new mappings inside the new index. 之后,旧文档将在新索引内使用新的映射进行索引。 And built-in reindex in elasticsearch will work more quickly, than import from MongoDB via HTTP API. 与通过HTTP API从MongoDB导入相比,elasticsearch中的内置reindex可以更快地工作。

If you will use reindex, don't forget to use parameter wait_for_completion (this parameter described in the documentation). 如果要使用reindex,请不要忘记使用参数wait_for_completion (此参数在文档中进行了介绍)。 This will run the reindex in the background. 这将在后台运行重新索引。

Is this approach will solve your problem? 这种方法会解决您的问题吗?

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM