简体   繁体   中英

AWS ElasticSearch 2.3 Java HTTP bulk API

I'm attampting to use a bulk HTTP api in Java on AWS ElasticSearch 2.3. When I use a rest client for teh bulk load, I get the following error:

504 GATEWAY_TIMEOUT

When I run it as Lambda in Java, for HTTP Posts, I get:

{
  "errorMessage": "2017-01-09T19:05:32.925Z 8e8164a7-d69e-11e6-8954-f3ac8e70b5be Task timed out after 15.00 seconds"
}

Through testing I noticed the bulk API doesn't work these with these settings:

    "number_of_shards" : 5,
    "number_of_replicas" : 5

When shards and replicas are set to 1, I can do a bulk load no problem. I have tried using this setting to allow for my bulk load as well:

    "refresh_interval" : -1

but so far it made no impact at all. In Java Lambda, I load my data as an InputStream from S3 location. What are my options at this point for Java HTTP? Is there anything else in index settings I could try? Is there anything else in AWS access policy I could try? Thank you for your time.

1Edit:

I also have tried these params: _bulk?action.write_consistency=one&refresh But makes no difference so far.

2Edit:

here is what made my bulk load work - set consistency param (I did NOT need to set refresh_interval):

            URIBuilder uriBuilder = new URIBuilder(myuri);
            uriBuilder = uriBuilder.addParameter("consistency", "one");                
            HttpPost post = new HttpPost(uriBuilder.build());
            HttpEntity entity = new InputStreamEntity(myInputStream);
            post.setEntity(entity); 

From my experience, the issue can occur when your index replication settings can not be satisfied by your cluster. This happens either during a network partition, or if you simply set a replication requirement that can not be satisfied by your physical cluster.

In my case, this happens when I apply my production settings (number_of_replicas : 3) to my development cluster (which is single node cluster).

Your two solutions (setting the replica's to 1 Or setting your consistency to 1) resolve this issue because they allow Elastic to continue the bulk index w/o waiting for additional replica's to come online.

Elastic Search probably could have a more intuitive message on failure, maybe they do in Elastic 5.

Setting your cluster to a single

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM