Spark Streaming：从 Kafka 读取 JSON 并添加 event_time

Question

I am trying to write a Stateful Spark Structured Streaming job that reads from Kafka.我正在尝试编写一个从 Kafka 读取的有状态 Spark 结构化流作业。 As part of the requirement I need to add 'event_time' to my stream as an additional column.作为要求的一部分，我需要将“event_time”作为附加列添加到我的流中。 I am trying something like this:我正在尝试这样的事情：

val schema = spark.read.json("sample-data/test.json").schema
val myStream = sparkSession
      .readStream
      .format("kafka")
      .option("kafka.bootstrap.servers", "localhost:9092")
      .option("subscribe", "myTopic")
      .load()
val df = myStream.select(from_json($"value".cast("string"), schema).alias("value"))
val withEventTime = df.selectExpr("*", "cast (value.arrivalTime as timestamp) as event_time")

But I keep getting message:但我不断收到消息：

cannot resolve 'arrivalTime' given input columns: [value]无法解析给定输入列的“arrivalTime”：[值]

How do I refer to all the elements in my JSON?如何引用 JSON 中的所有元素？

Answer 1

I believe I was able to solve this using this:我相信我能够使用这个解决这个问题：

val withEventTime = df.withColumn("event_time",to_timestamp(col("value. arrivalTime")))

Not sure why this worked & not the other one.不知道为什么这有效而不是另一个。

Spark Streaming：从 Kafka 读取 JSON 并添加 event_time

问题描述

1 个解决方案

解决方案1
0 2020-03-04 02:06:49

Spark Streaming：从 Kafka 读取 JSON 并添加 event_time

问题描述

1 个解决方案

解决方案1 0 2020-03-04 02:06:49

解决方案1
0 2020-03-04 02:06:49