简体   繁体   English

AWS ElasticSearch 2.3 Java HTTP批量API

[英]AWS ElasticSearch 2.3 Java HTTP bulk API

I'm attampting to use a bulk HTTP api in Java on AWS ElasticSearch 2.3. 我打算在AWS ElasticSearch 2.3上的Java中使用批量HTTP api。 When I use a rest client for teh bulk load, I get the following error: 当我使用rest客户端进行批量加载时,出现以下错误:

504 GATEWAY_TIMEOUT

When I run it as Lambda in Java, for HTTP Posts, I get: 当我在Java中以Lambda形式运行它时,对于HTTP帖子,我得到:

{
  "errorMessage": "2017-01-09T19:05:32.925Z 8e8164a7-d69e-11e6-8954-f3ac8e70b5be Task timed out after 15.00 seconds"
}

Through testing I noticed the bulk API doesn't work these with these settings: 通过测试,我注意到批量API在以下设置下不起作用:

    "number_of_shards" : 5,
    "number_of_replicas" : 5

When shards and replicas are set to 1, I can do a bulk load no problem. 当分片和副本设置为1时,我可以进行批量加载而没有问题。 I have tried using this setting to allow for my bulk load as well: 我尝试使用此设置来允许我的批量加载:

    "refresh_interval" : -1

but so far it made no impact at all. 但到目前为止,它完全没有影响。 In Java Lambda, I load my data as an InputStream from S3 location. 在Java Lambda中,我从S3位置将数据作为InputStream加载。 What are my options at this point for Java HTTP? 目前,对于Java HTTP,我有哪些选择? Is there anything else in index settings I could try? 我可以尝试使用索引设置中的其他内容吗? Is there anything else in AWS access policy I could try? 我可以尝试的AWS访问策略中还有其他内容吗? Thank you for your time. 感谢您的时间。

1Edit: 1编辑:

I also have tried these params: _bulk?action.write_consistency=one&refresh But makes no difference so far. 我还尝试了以下参数:_bulk?action.write_consistency = one&refresh但到目前为止没有任何区别。

2Edit: 2编辑:

here is what made my bulk load work - set consistency param (I did NOT need to set refresh_interval): 这是使我的大容量加载工作的原因-设置一致性参数(我不需要设置refresh_interval):

            URIBuilder uriBuilder = new URIBuilder(myuri);
            uriBuilder = uriBuilder.addParameter("consistency", "one");                
            HttpPost post = new HttpPost(uriBuilder.build());
            HttpEntity entity = new InputStreamEntity(myInputStream);
            post.setEntity(entity); 

From my experience, the issue can occur when your index replication settings can not be satisfied by your cluster. 根据我的经验,当群集无法满足您的索引复制设置时,可能会发生此问题。 This happens either during a network partition, or if you simply set a replication requirement that can not be satisfied by your physical cluster. 这可能是在网络分区期间发生的,或者是您只是设置了物理群集无法满足的复制要求。

In my case, this happens when I apply my production settings (number_of_replicas : 3) to my development cluster (which is single node cluster). 就我而言,这是在将生产设置(number_of_replicas:3)应用于开发集群(即单节点集群)时发生的。

Your two solutions (setting the replica's to 1 Or setting your consistency to 1) resolve this issue because they allow Elastic to continue the bulk index w/o waiting for additional replica's to come online. 您的两种解决方案(将副本的数量设置为1或将一致性设置为1)解决了此问题,因为它们使Elastic可以继续批量索引而无需等待其他副本上线。

Elastic Search probably could have a more intuitive message on failure, maybe they do in Elastic 5. Elastic Search可能会有关于失败的更直观的消息,也许在Elastic 5中也是如此。

Setting your cluster to a single 将集群设置为单个

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM