简体   繁体   中英

Spark Streaming: Read JSON from Kafka and add event_time

I am trying to write a Stateful Spark Structured Streaming job that reads from Kafka. As part of the requirement I need to add 'event_time' to my stream as an additional column. I am trying something like this:

val schema = spark.read.json("sample-data/test.json").schema
val myStream = sparkSession
      .readStream
      .format("kafka")
      .option("kafka.bootstrap.servers", "localhost:9092")
      .option("subscribe", "myTopic")
      .load()
val df = myStream.select(from_json($"value".cast("string"), schema).alias("value"))
val withEventTime = df.selectExpr("*", "cast (value.arrivalTime as timestamp) as event_time")

But I keep getting message:

cannot resolve 'arrivalTime' given input columns: [value]

How do I refer to all the elements in my JSON?

I believe I was able to solve this using this:

val withEventTime = df.withColumn("event_time",to_timestamp(col("value. arrivalTime")))

Not sure why this worked & not the other one.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM