简体   繁体   English

弹性搜索中的体积指数

[英]Bulk index in elasticsearch

I have to index a JSON array into elastic-search index. 我必须将JSON数组索引到弹性搜索索引中。 I am using javascript client to index the data. 我正在使用javascript客户端对数据进行索引。

I looped the array and indexed as follows: 我循环了数组并建立了索引,如下所示:

for (var i = 0; i < rawData.length; i++ ) {
    client.create({  
        index: "name",
        type: "rrrrr",  
        body: rawData[i]
    }, function(error, response){
    });
}

I need to avoid the loop. 我需要避免循环。 SO i decided to go for "BULK API" 所以我决定去买“ BULK API”

I referred https://www.elastic.co/guide/en/elasticsearch/reference/current/docs-bulk.html , in that for every document we have to specify headers as follows: 我引用了https://www.elastic.co/guide/zh-CN/elasticsearch/reference/current/docs-bulk.html ,因为对于每个文档,我们都必须指定标题如下:

{ "index" : { "_index" : "test", "_type" : "type1", "_id" : "1" } }

But in the JSON array what i have will not contain this header. 但是在JSON数组中,我所拥有的将不包含此标头。 So anyway i have to loop here also. 所以无论如何我也要在这里循环。 Without loop how can i achieve this.Please share your ideas. 没有循环,我该如何实现。请分享您的想法。

Short answer: The BULK API will not help you with this. 简短答案: BULK API对此将无济于事。

The BULK API is not meant to reduce the number of loops your code may need to perform to format the data correctly - it exists to reduce data transfer between the client and the ES cluster. BULK API并不意味着减少代码可能需要执行的循环次数以正确格式化数据-它的存在是为了减少客户端与ES集群之间的数据传输。 Instead of 1 call (from client, to server) per record to index, the Bulk API allows you to do 1 call for N records which results in faster indexing on the cluster side but also for a much faster execution time on the client-side. 通过批量API,您可以对N条记录进行1次调用 ,而不必为每个记录对索引进行1次调用 (从客户端到服务器),这将导致在群集端建立索引的速度更快,但在客户端执行时间也要快得多。

That said, for the specific fields you mention, the BULK API does allow you to avoid specifying this for each and every record you want to bulk index. 就是说,对于您提到的特定字段,BULK API确实允许您避免为要批量索引的每条记录指定此名称。 When using the PHP API, you can set the index and type once and then loop on your raw data only. 使用PHP API时,您可以设置indextype一次,然后仅循环处理原始数据。 Here's a code snippet using the Elasticsearch PHP API: 这是使用Elasticsearch PHP API的代码片段:

$esclient = // Set up an ES client here
$joined = '';
foreach($data as $q) {
    $joined .= json_encode($q) . "\n";
}

$all_es_params = array();
$all_es_params['body'] = <<<EOT
$joined

EOT;

// Assumes all documents will be put in the same index
$all_es_params['index']     = 'YOUR_INDEX_NAME';

// Assumes all documents have same type in your bulk call
$all_es_params['type']      = 'YOU_DOCUMENT_TYPE';

try {
    // Call ES BULK API
    $ret = $esclient->bulk($all_es_params);
} catch (Exception $e) {
    // Something went wrong
}

In the example above: 在上面的示例中:

  • $data is an array formatted for the Bulk API containing one record to index. $data是为Bulk API格式化的数组,其中包含一个要索引的记录。
  • A loop on the data is done to merge all records into one body value. 完成数据循环,将所有记录合并为一个body值。
  • The Bulk call is set up to specify the index and type values (instead of putting this data in each and every $data element). 设置Bulk调用以指定indextype值(而不是将此数据放入每个$data元素中)。
  • Finally, the Bulk API is called and will return a response with any potential errors. 最后,将调用Bulk API,并将返回任何潜在错误的响应。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM