简体   繁体   English

Elasticsearch-Hadoop库无法连接到Docker容器

[英]Elasticsearch-Hadoop library cannot connect to to docker container

I have spark job that reads from Cassandra, processes/transforms/filters the data, and writes the results to Elasticsearch. 我有一个火花工作,它从Cassandra读取,处理/转换/过滤数据,并将结果写入Elasticsearch。 I use docker for my integration tests, and I am running into trouble of writing from spark to Elasticsearch. 我使用docker进行集成测试,并且遇到了从Spark写入Elasticsearch的麻烦。

Dependencies: 依存关系:

"joda-time"              % "joda-time"          % "2.9.4",
"javax.servlet"          %  "javax.servlet-api" % "3.1.0",
"org.elasticsearch"      %  "elasticsearch"     % "2.3.2",
"org.scalatest"          %% "scalatest"         % "2.2.1",
"com.github.nscala-time" %% "nscala-time"       % "2.10.0",
"cascading"              %   "cascading-hadoop" % "2.6.3",
"cascading"              %   "cascading-local"  % "2.6.3",
"com.datastax.spark"     %% "spark-cassandra-connector" % "1.4.2",
"com.datastax.cassandra" % "cassandra-driver-core" % "2.1.5",
"org.elasticsearch"      %  "elasticsearch-hadoop"      % "2.3.2" excludeAll(ExclusionRule("org.apache.storm")),
"org.apache.spark"       %% "spark-catalyst"            % "1.4.0" % "provided"

In my unit tests I can connect to elasticsearch using a TransportClient to setup my template and index 在我的单元测试中,我可以使用TransportClient连接到elasticsearch来设置模板和索引

aka. 又名 This works 这有效

val conf = new SparkConf().setAppName("test_reindex").setMaster("local")
  .set("spark.cassandra.input.split.size_in_mb", "67108864")
  .set("spark.cassandra.connection.host", cassandraHostString)
  .set("es.nodes", elasticsearchHostString)
  .set("es.port", "9200")
  .set("http.publish_host", "")
sc = new SparkContext(conf)
esClient = TransportClient.builder().build()
esClient.addTransportAddress(new InetSocketTransportAddress(InetAddress.getByName(elasticsearchHostString), 9300))
esClient.admin().indices().preparePutTemplate(testTemplate).setSource(Source.fromInputStream(getClass.getResourceAsStream("/mytemplate.json")).mkString).execute().actionGet()
esClient.admin().indices().prepareCreate(esTestIndex).execute().actionGet()
esClient.admin().indices().prepareAliases().addAlias(esTestIndex, "hot").execute().actionGet()

However when I try to run 但是当我尝试跑步时

EsSpark.saveToEs(
  myRDD,
  "hot/mytype",
  Map("es.mapping.id" -> "id", "es.mapping.parent" -> "parent_id")
)

I receive this stack trace 我收到此堆栈跟踪

org.elasticsearch.hadoop.rest.EsHadoopNoNodesLeftException: Connection error (check network and/or proxy settings)- all nodes failed; tried [[172.17.0.2:9200]] 
at org.elasticsearch.hadoop.rest.NetworkClient.execute(NetworkClient.java:142)
at org.elasticsearch.hadoop.rest.RestClient.execute(RestClient.java:434)
at org.elasticsearch.hadoop.rest.RestClient.executeNotFoundAllowed(RestClient.java:442)
at org.elasticsearch.hadoop.rest.RestClient.exists(RestClient.java:518)
at org.elasticsearch.hadoop.rest.RestClient.touch(RestClient.java:524)
at org.elasticsearch.hadoop.rest.RestRepository.touch(RestRepository.java:491)
at org.elasticsearch.hadoop.rest.RestService.initSingleIndex(RestService.java:412)
at org.elasticsearch.hadoop.rest.RestService.createWriter(RestService.java:400)
at org.elasticsearch.spark.rdd.EsRDDWriter.write(EsRDDWriter.scala:40)
at org.elasticsearch.spark.rdd.EsSpark$$anonfun$saveToEs$1.apply(EsSpark.scala:67)
at org.elasticsearch.spark.rdd.EsSpark$$anonfun$saveToEs$1.apply(EsSpark.scala:67)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
at org.apache.spark.scheduler.Task.run(Task.scala:89)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
16/08/08 12:30:46 WARN TaskSetManager: Lost task 0.0 in stage 2.0 (TID 2, localhost): org.elasticsearch.hadoop.rest.EsHadoopNoNodesLeftException: Connection error (check network and/or proxy settings)- all nodes failed; tried [[172.17.0.2:9200]] 
at org.elasticsearch.hadoop.rest.NetworkClient.execute(NetworkClient.java:142)
at org.elasticsearch.hadoop.rest.RestClient.execute(RestClient.java:434)
at org.elasticsearch.hadoop.rest.RestClient.executeNotFoundAllowed(RestClient.java:442)
at org.elasticsearch.hadoop.rest.RestClient.exists(RestClient.java:518)
at org.elasticsearch.hadoop.rest.RestClient.touch(RestClient.java:524)
at org.elasticsearch.hadoop.rest.RestRepository.touch(RestRepository.java:491)
at org.elasticsearch.hadoop.rest.RestService.initSingleIndex(RestService.java:412)
at org.elasticsearch.hadoop.rest.RestService.createWriter(RestService.java:400)
at org.elasticsearch.spark.rdd.EsRDDWriter.write(EsRDDWriter.scala:40)
at org.elasticsearch.spark.rdd.EsSpark$$anonfun$saveToEs$1.apply(EsSpark.scala:67)
at org.elasticsearch.spark.rdd.EsSpark$$anonfun$saveToEs$1.apply(EsSpark.scala:67)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
at org.apache.spark.scheduler.Task.run(Task.scala:89)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)

I can verify using 'docker network inspect bridge that it is trying to connect to the correct ip address. 我可以使用“ docker network inspect网桥”验证它正在尝试连接到正确的IP地址。

docker network inspect bridge
[
{
    "Name": "bridge",
    "Id": "ef184e3be3637be28f854c3278f1c8647be822a9413120a8957de6d2d5355de1",
    "Scope": "local",
    "Driver": "bridge",
    "EnableIPv6": false,
    "IPAM": {
        "Driver": "default",
        "Options": null,
        "Config": [
            {
                "Subnet": "172.17.0.0/16",
                "Gateway": "172.17.0.1"
            }
        ]
    },
    "Internal": false,
    "Containers": {
        "0c79680de8ef815bbe4bdd297a6f845cce97ef18bb2f2c12da7fe364906c3676": {
            "Name": "analytics_rabbitmq_1",
            "EndpointID": "3f03fdabd015fa1e2af802558aa59523f4a3c8c72f1231d07c47a6c8e60ae0d4",
            "MacAddress": "02:42:ac:11:00:04",
            "IPv4Address": "172.17.0.4/16",
            "IPv6Address": ""
        },
        "9b1f37c8df344c50e042c4b3c75fcb2774888f93fd7a77719fb286bb13f76f38": {
            "Name": "analytics_elasticsearch_1",
            "EndpointID": "fb083d27aaf8c0db1aac90c2a1ea2f752c46d8ac045e365f4b9b7d1651038a56",
            "MacAddress": "02:42:ac:11:00:02",
            "IPv4Address": "172.17.0.2/16",
            "IPv6Address": ""
        },
        "ed0cfad868dbac29bda66de6bee93e7c8caf04d623d9442737a00de0d43c372a": {
            "Name": "analytics_cassandra_1",
            "EndpointID": "2efa95980d681b3627a7c5e952e2f01980cf5ffd0fe4ba6185b2cab735784df6",
            "MacAddress": "02:42:ac:11:00:03",
            "IPv4Address": "172.17.0.3/16",
            "IPv6Address": ""
        }
    },
    "Options": {
        "com.docker.network.bridge.default_bridge": "true",
        "com.docker.network.bridge.enable_icc": "true",
        "com.docker.network.bridge.enable_ip_masquerade": "true",
        "com.docker.network.bridge.host_binding_ipv4": "0.0.0.0",
        "com.docker.network.bridge.name": "docker0",
        "com.docker.network.driver.mtu": "1500"
    },
    "Labels": {}
}
]

I am running everything locally on a macbook/osx. 我在Macbook / OSX上本地运行所有内容。 I am at a loss for why I can connect to the docker container using the TransportClient and through my browser, but the function EsSpark.saveToES(...) always fails. 我不知道为什么我可以使用TransportClient并通过浏览器连接到Docker容器,但是函数EsSpark.saveToES(...)始终失败。

By setting 通过设置

.config("es.nodes.wan.only", "true")

can solve this issue 可以解决这个问题

es.nodes.ingest.only es.nodes.ingest.only

(default false) Whether to use Elasticsearch ingest nodes only. (默认为false)是否仅使用Elasticsearch接收节点。 When enabled, elasticsearch-hadoop will route all of its requests (after nodes discovery, if enabled) through the ingest nodes within the cluster. 启用后,elasticsearch-hadoop将通过集群中的接收节点路由其所有请求(在发现节点后,如果启用)。 The purpose of this configuration setting is to avoid incurring the cost of forwarding data meant for a pipeline from non-ingest nodes; 此配置设置的目的是避免招致从非摄取节点转发用于管道的数据的成本; Really only useful when writing data to an Ingest Pipeline (see es.ingest.pipeline above). 仅在将数据写入Ingest Pipeline时才真正有用(请参见上面的es.ingest.pipeline)。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 无法通过Elasticsearch-hadoop库在多重火花节点上的RDD上应用映射 - Fail to apply mapping on an RDD on multipe spark nodes through Elasticsearch-hadoop library Scala SBT elasticsearch-hadoop 未解决的依赖 - Scala SBT elasticsearch-hadoop unresolved dependency 简单esRDD(Spark中使用的Elasticsearch-hadoop连接器)引发了异常 - Exception raised with simple esRDD (elasticsearch-hadoop connector used in Spark) 如何在Spark中使用Elasticsearch-Hadoop将数据从一个Elasticsearch集群重新索引到另一个集群 - How to reindex data from one Elasticsearch cluster to another with elasticsearch-hadoop in Spark sbt使用elasticsearch-hadoop与Spark发生“冲突的跨版本后缀”错误 - sbt “Conflicting cross-version suffixes” error with Spark using elasticsearch-hadoop 无法在Elasticsearch-hadoop中使用SchemaRDD.saveToES()从HDFS索引JSON - Unable to index JSON from HDFS using SchemaRDD.saveToES() in Elasticsearch-hadoop 无法连接到 Docker postgres 容器 - Scala - Cannot Connect to Docker postgres container - Scala 无法通过Play框架服务器连接到Postgresql Docker容器 - Cannot connect to postgresql docker container through Play framework server 如何从 Docker 容器连接到 Confluent Cloud - How to connect to Confluent Cloud from a Docker container 无法连接到Scala测试中的Cassandra docker容器 - Can't connect to Cassandra docker container in Scala Tests
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM