[英]Connecting Spark and elasticsearch
我正在嘗試運行一個簡單的Spark代碼,該代碼將RDD的內容復制到彈性搜索文檔中。 Spark和彈性搜索均安裝在我的本地計算機上。
import org.elasticsearch.spark.sql._
import org.apache.spark.sql.SparkSession
object ES {
case class Person(ID: Int, name: String, age: Int, numFriends:
Int);
def mapper(line: String): Person = {
val fields = line.split(',')
val person: Person = Person(fields(0).toInt, fields(1),
fields(2).toInt, fields(3).toInt)
return person}
def main(args: Array[String]): Unit = {
val spark: SparkSession =
SparkSession
.builder().master("local[*]")
.appName("SparkEs")
.config("es.index.auto.create", "true")
.config("es.nodes","localhost:9200")
.getOrCreate()
import spark.implicits._
val lines = spark.sparkContext.textFile("/home/herch/fakefriends.csv")
val people = lines.map(mapper).toDF()
people.saveToEs("spark/people")
}
}
我收到此錯誤。 重試多次后
INFO HttpMethodDirector: I/O exception (java.net.ConnectException)
caught when processing request:Connection timed out (Connection timed
out)
INFO HttpMethodDirector: Retrying request
INFO DAGScheduler: ResultStage 0 (runJob at EsSparkSQL.scala:97)
failed in 525.902 s due to Job aborted due to stage failure: Task 1
in stage 0.0 failed 1 times, most recent failure: Lost task 1.0 in
stage 0.0 (TID 1, localhost, executor driver):
org.elasticsearch.hadoop.rest.EsHadoopNoNodesLeftException:
Connection error (check network and/or proxy settings)- all nodes
failed; tried [[192.168.0.22:9200]]
這似乎是一個連接問題,但我無法確定其原因。 彈性搜索在本地計算機上的localhost:9200上運行,我可以通過終端查詢它。
如在elasticsearch / spark連接器文檔頁面上所示 ,您需要在配置內部分隔host和port參數:
val options13 = Map("path" -> "spark/index",
"pushdown" -> "true",
"es.nodes" -> "someNode", "es.port" -> "9200")
查看es.nodes
如何僅包含主機名,而es.port
包含HTTP端口。
由於您都在本地運行,因此需要在es.nodes.wan.only
設置為true
(默認為false
)。 我遇到了同樣的問題,並解決了該問題。
參見: https : //www.elastic.co/guide/en/elasticsearch/hadoop/current/configuration.html
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.