简体   繁体   中英

Bulk index in elasticsearch

I have to index a JSON array into elastic-search index. I am using javascript client to index the data.

I looped the array and indexed as follows:

for (var i = 0; i < rawData.length; i++ ) {
    client.create({  
        index: "name",
        type: "rrrrr",  
        body: rawData[i]
    }, function(error, response){
    });
}

I need to avoid the loop. SO i decided to go for "BULK API"

I referred https://www.elastic.co/guide/en/elasticsearch/reference/current/docs-bulk.html , in that for every document we have to specify headers as follows:

{ "index" : { "_index" : "test", "_type" : "type1", "_id" : "1" } }

But in the JSON array what i have will not contain this header. So anyway i have to loop here also. Without loop how can i achieve this.Please share your ideas.

Short answer: The BULK API will not help you with this.

The BULK API is not meant to reduce the number of loops your code may need to perform to format the data correctly - it exists to reduce data transfer between the client and the ES cluster. Instead of 1 call (from client, to server) per record to index, the Bulk API allows you to do 1 call for N records which results in faster indexing on the cluster side but also for a much faster execution time on the client-side.

That said, for the specific fields you mention, the BULK API does allow you to avoid specifying this for each and every record you want to bulk index. When using the PHP API, you can set the index and type once and then loop on your raw data only. Here's a code snippet using the Elasticsearch PHP API:

$esclient = // Set up an ES client here
$joined = '';
foreach($data as $q) {
    $joined .= json_encode($q) . "\n";
}

$all_es_params = array();
$all_es_params['body'] = <<<EOT
$joined

EOT;

// Assumes all documents will be put in the same index
$all_es_params['index']     = 'YOUR_INDEX_NAME';

// Assumes all documents have same type in your bulk call
$all_es_params['type']      = 'YOU_DOCUMENT_TYPE';

try {
    // Call ES BULK API
    $ret = $esclient->bulk($all_es_params);
} catch (Exception $e) {
    // Something went wrong
}

In the example above:

  • $data is an array formatted for the Bulk API containing one record to index.
  • A loop on the data is done to merge all records into one body value.
  • The Bulk call is set up to specify the index and type values (instead of putting this data in each and every $data element).
  • Finally, the Bulk API is called and will return a response with any potential errors.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM