简体   繁体   中英

Elasticsearch Performance Analysis

We are currently evaluating Elasticsearch as our solution for Analytics. The main driver is the fact that once the data is populated into Elasticsearch, the reporting comes for free with Kibana.

Before adopting it, I am tasked to do a performance analysis of the tool.

The main requirement is supporting a PUT rate of 500 evt/sec.

I am currently starting with a small setup as follows just to get a sense of the API before I upload that to a more serious lab.

My Strategy is basically, going over CSVs of analytics that correspond to the format I need and putting them into elasticsearch. I am not using the bulk API because in reality the events will not arrive in a bulk fashion.

Following is the main code that does this:

        // Created once, used for creating a JSON from a bean
        ObjectMapper mapper = new ObjectMapper();

        // Creating a measurement for checking the count of sent events vs
        // ES stored events
        AnalyticsMetrics metrics = new AnalyticsMetrics();
        metrics.startRecording();

        File dir = new File(mFolder);
        for (File file : dir.listFiles()) {

            CSVReader reader = new CSVReader(new FileReader(file.getAbsolutePath()), '|');
            String [] nextLine;
            while ((nextLine = reader.readNext()) != null) {
                AnalyticRecord record = new AnalyticRecord();
                record.serializeLine(nextLine);

                // Generate json
                String json = mapper.writeValueAsString(record);

                IndexResponse response = mClient.getClient().prepareIndex("sdk_sync_log", "sdk_sync")
                        .setSource(json)
                        .execute()
                        .actionGet();

                // Recording Metrics
                metrics.sent();

            }
        }

        metrics.stopRecording();

        return metrics;

I have the following questions:

  1. How do I know through the API when all the requests are completed and the data is saved into Elasticsearch? I could query Elasticsearch for the objects counts in my particular index but doing that would be a new performance factor by itself, hence I am eliminating this option.
  2. Is the above the fastest way to insert object to Elasticsearch or are there other optimizations I could do. Keep in mind the bulk API is not an option for now.

Thx in advance.

PS: the Elasticsearch version I am using on both client and server is 1.0.0.

  1. Elasticsearch index response has isCreated() method that returns true if the document is a new one or false if it has been updated and can be used to see if the document was successfully inserted/updated.

  2. If bulk indexing is not an option there are other areas that could be tweaked to improve performance like

    • increasing index refresh interval using index.refresh_interval
    • disabling replicas by setting index.number_of_replicas to 0
    • Disabling _source and _all fields if they are not needed.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM