简体   繁体   中英

How to upsert or partial updates with script documents in ElasticSearch with Spark?

I have a pseudocode in python that reads from a Kafka stream and upsert documents in Elasticsearch (incrementing a counter view if the document exists already.

for message in consumer:

    msg = json.loads(message.value)
    print(msg)
    index = INDEX_NAME
    es_id = msg["id"]
    script = {"script":"ctx._source.view+=1","upsert" : msg}
    es.update(index=index, doc_type="test", id=es_id, body=script)

Since I want to use it in a distributed environment, I am using Spark Structured Streaming

df.writeStream \
.format("org.elasticsearch.spark.sql")\
.queryName("ESquery")\
.option("es.resource","credentials/url") \
.option("checkpointLocation", "checkpoint").start()

or SparkStreaming in scala that reads from KafkaStream:

// Initializing Spark Streaming Context and kafka stream
sparkConf.setMaster("local[2]")
val ssc = new StreamingContext(sparkConf, Seconds(10))
[...] 
val messages = KafkaUtils.createDirectStream[String, String](
      ssc,
      PreferConsistent,
      Subscribe[String, String](topicsSet, kafkaParams)
    )

[...]
val urls = messages.map(record => JsonParser.parse(record.value()).values.asInstanceOf[Map[String, Any]])
urls.saveToEs("credentials/credential")

.saveToEs(...) is the API of elastic-hadoop.jar documented here . Unfortunately this repo is not really well documented. So I cannot understand where I can put the script command.

Is there anyone can help me? Thank you in advance

You should be able to do it by setting write mode "update" ( or upsert) and passing your script as "script" (depends on ES version).

EsSpark.saveToEs(rdd, "spark/docs", Map("es.mapping.id" -> "id", "es.write.operation" -> "update","es.update.script.inline" -> "your script" , ))

Probably you want to use "upsert"

There are some good unit tests in cascading integration in same library; These settings should be good for spark as both uses same writer.

I suggest to read unit tests to pick correct settings for your ES version.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM