带有时间戳字段的 Elasticsearch 和 Spark 写入错误

Question

我需要一种将以下时间戳写入 Elasticsearch 的方法，而不会出现错误消息。 下面的代码读取 JSON 文件，然后写入 Elasticsearch。

我的代码：

import org.apache.spark.sql.types._
val schemaDF = spark.read.json("/tmp/LTPD/schema.json")
schemaDF.printSchema()
val schema = schemaDF.schema


   //read from JSON file
   val streamingDF = spark
     .readStream
     .schema(schema)
     .json("/tmp/Directory/")
   streamingDF
     .writeStream
     .outputMode("append")
     .format("org.elasticsearch.spark.sql")
     .trigger(Trigger.ProcessingTime(conf.getString("spark.trigger")))
     .start("indexname/ourdoctype").awaitTermination()

该代码适用于时间戳字段中的空值，但当 json 具有2019-08-15T09:40:13+00:00或2020-03-02T15:13:26Z字符串时会抱怨。

示例 Json

{
  "name":"Jordan", 
  "date": "2019-06-01T00:00:00+00:00", 
  "gmt": "2020-03-02T15:13:26Z", 
  "skills":["Scala", "Spark", "Akka"]
}

我看到异常：

failed to parse field [metaData.collectionDateUtc] of type [long] in
document with id org.elasticsearch.hadoop.rest.EsHadoopRemoteException:
illegal_argument_exception: For input string: "2019-08-15T09:40:13+00:00"

Answer 1

您需要使用 spark 作为时间戳转储数据（如果您的日期字段在String ）

.withColumn("date", to_timestamp($"date")

此外，您需要根据官方文档中的提及更改索引的映射

带有时间戳字段的 Elasticsearch 和 Spark 写入错误

问题描述

1 个解决方案

解决方案1
0 2020-07-02 10:31:35

带有时间戳字段的 Elasticsearch 和 Spark 写入错误

问题描述

1 个解决方案

解决方案1 0 2020-07-02 10:31:35

解决方案1
0 2020-07-02 10:31:35