简体   繁体   English

Elasticsearch性能分析

[英]Elasticsearch Performance Analysis

We are currently evaluating Elasticsearch as our solution for Analytics. 我们目前正在评估Elasticsearch作为我们的Analytics解决方案。 The main driver is the fact that once the data is populated into Elasticsearch, the reporting comes for free with Kibana. 主要驱动因素是,一旦将数据填充到Elasticsearch中,Kibana便会免费提供报告。

Before adopting it, I am tasked to do a performance analysis of the tool. 在采用它之前,我受命对工具进行性能分析。

The main requirement is supporting a PUT rate of 500 evt/sec. 主要要求是支持500 evt / sec的PUT速率。

I am currently starting with a small setup as follows just to get a sense of the API before I upload that to a more serious lab. 我目前从一个小的设置开始,如下所示,目的是在将其上传到更严格的实验室之前先了解一下API。

My Strategy is basically, going over CSVs of analytics that correspond to the format I need and putting them into elasticsearch. 我的策略基本上是,遍历与我需要的格式相对应的分析CSV并将其放入elasticsearch。 I am not using the bulk API because in reality the events will not arrive in a bulk fashion. 我没有使用批量API,因为实际上事件不会以批量方式到达。

Following is the main code that does this: 以下是执行此操作的主要代码:

        // Created once, used for creating a JSON from a bean
        ObjectMapper mapper = new ObjectMapper();

        // Creating a measurement for checking the count of sent events vs
        // ES stored events
        AnalyticsMetrics metrics = new AnalyticsMetrics();
        metrics.startRecording();

        File dir = new File(mFolder);
        for (File file : dir.listFiles()) {

            CSVReader reader = new CSVReader(new FileReader(file.getAbsolutePath()), '|');
            String [] nextLine;
            while ((nextLine = reader.readNext()) != null) {
                AnalyticRecord record = new AnalyticRecord();
                record.serializeLine(nextLine);

                // Generate json
                String json = mapper.writeValueAsString(record);

                IndexResponse response = mClient.getClient().prepareIndex("sdk_sync_log", "sdk_sync")
                        .setSource(json)
                        .execute()
                        .actionGet();

                // Recording Metrics
                metrics.sent();

            }
        }

        metrics.stopRecording();

        return metrics;

I have the following questions: 我有以下问题:

  1. How do I know through the API when all the requests are completed and the data is saved into Elasticsearch? 如何通过API知道所有请求均已完成并将数据保存到Elasticsearch中? I could query Elasticsearch for the objects counts in my particular index but doing that would be a new performance factor by itself, hence I am eliminating this option. 我可以查询Elasticsearch以获取特定索引中的对象数,但是这样做本身就是一个新的性能因素,因此我取消了此选项。
  2. Is the above the fastest way to insert object to Elasticsearch or are there other optimizations I could do. 是上述将对象插入Elasticsearch的最快方法,还是我可以做其他优化。 Keep in mind the bulk API is not an option for now. 请记住,批量API暂时不可用。

Thx in advance. 提前谢谢。

PS: the Elasticsearch version I am using on both client and server is 1.0.0. PS:我在客户端和服务器上使用的Elasticsearch版本是1.0.0。

  1. Elasticsearch index response has isCreated() method that returns true if the document is a new one or false if it has been updated and can be used to see if the document was successfully inserted/updated. Elasticsearch索引响应具有isCreated()方法,如果文档是新文档,则返回true如果文档已更新,则返回false ,可用于查看文档是否已成功插入/更新。

  2. If bulk indexing is not an option there are other areas that could be tweaked to improve performance like 如果不能选择批量索引,则可以对其他领域进行调整以提高性能,例如

    • increasing index refresh interval using index.refresh_interval 使用index.refresh_interval增加索引刷新间隔
    • disabling replicas by setting index.number_of_replicas to 0 通过将index.number_of_replicas设置为0来禁用副本
    • Disabling _source and _all fields if they are not needed. 如果不需要_source_all字段,请禁用它们。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM