简体   繁体   English

迁移数百万文档时,使用 elasticcluster 的远程重建索引 API 的最佳方法是什么?

[英]What is the best approach to use remote reindexing API of elasticcluster when migration millions of documents?

I have approx.我有大约。 100million documents in an index and i want to migrate it to new cluster using reindex API. I want to do it in the throttling manner.索引中有 1 亿个文档,我想使用 reindex API 将其迁移到新集群。我想以节流方式进行。

I tried using request_per_seconds to 100000 but it will take hours to complete whole process.我尝试使用request_per_seconds100000 ,但完成整个过程需要几个小时。 Q.1 Can i use request_per_seconds to maybe 1000000 to reduce process time? Q.1 我可以使用request_per_seconds1000000来减少处理时间吗? Q.2 Is there any better approach i can use for better reindexing in throttling manner? Q.2 是否有更好的方法可以用于以节流方式更好地重建索引?

Reindex supports Sliced scroll to parallelize the reindexing process. Reindex 支持切片滚动以并行化重新索引过程。 This parallelization can improve efficiency and provide a convenient way to break the request down into smaller parts.这种并行化可以提高效率,并提供一种将请求分解为更小部分的便捷方式。

POST _reindex?slices=5&refresh
{
  "source": {
    "index": "my-index-000001"
  },
  "dest": {
    "index": "my-new-index-000001"
  }
}

https://www.elastic.co/guide/en/elasticsearch/reference/current/docs-reindex.html#docs-reindex-automatic-slice https://www.elastic.co/guide/en/elasticsearch/reference/current/docs-reindex.html#docs-reindex-automatic-slice

You can also read about the advice for optimizing for speed, things like:您还可以阅读有关优化速度的建议,例如:

  • Disabling refresh for that period在那段时间禁用刷新
  • Reduce replicas to 0 etc..将副本减少到 0 等。

Link:关联:

https://www.elastic.co/guide/en/elasticsearch/reference/current/tune-for-indexing-speed.html https://www.elastic.co/guide/en/elasticsearch/reference/current/tune-for-indexing-speed.html

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 使用 SendGrid 和 Rails 发送时事通讯 - 什么是最好的方法? - Sending newsletters with SendGrid and Rails - What is the best approach? 使用 Firebase Admin SDK 处理多个服务帐户的最佳方法是什么? - What is the best approach for handling multiple Service Accounts with Firebase Admin SDK? 在 AWS 中等待并收到 SSM.SendCommand 完成通知的最佳方法是什么 - What is the best approach to wait and be notified of completion of SSM.SendCommand in AWS 将多个转换应用于 apache RDD 的最佳方法是什么? - what is the best approach to apply multiple transformations to an apache RDD? 最好使用哪种登录方法? - What Sign in method to use best? AWS EKS 中数百万连接的最佳入口 Controller? - Best Ingress Controller in AWS EKS for millions of connections? 在 Firebase Remote Config 中定义 bool 变量的最佳方法是什么 - What is the best way to define bool variables in Firebase Remote Config 使用谷歌客户端的最佳实践 api - best praticle to use the client from an google api 在 DataBrick 平台中为 PySpark API 安装 IsolationForest 的最佳做法是什么? - What is the best practice to install IsolationForest in DataBrick platform for PySpark API? 在 NodeJS Cloud Functions 中使用本地模块的最佳方式是什么? - What is the best way to use local module in NodeJS Cloud Functions?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM