简体   繁体   English

Spark Streaming:从 Kafka 读取 JSON 并添加 event_time

[英]Spark Streaming: Read JSON from Kafka and add event_time

I am trying to write a Stateful Spark Structured Streaming job that reads from Kafka.我正在尝试编写一个从 Kafka 读取的有状态 Spark 结构化流作业。 As part of the requirement I need to add 'event_time' to my stream as an additional column.作为要求的一部分,我需要将“event_time”作为附加列添加到我的流中。 I am trying something like this:我正在尝试这样的事情:

val schema = spark.read.json("sample-data/test.json").schema
val myStream = sparkSession
      .readStream
      .format("kafka")
      .option("kafka.bootstrap.servers", "localhost:9092")
      .option("subscribe", "myTopic")
      .load()
val df = myStream.select(from_json($"value".cast("string"), schema).alias("value"))
val withEventTime = df.selectExpr("*", "cast (value.arrivalTime as timestamp) as event_time")

But I keep getting message:但我不断收到消息:

cannot resolve 'arrivalTime' given input columns: [value]无法解析给定输入列的“arrivalTime”:[值]

How do I refer to all the elements in my JSON?如何引用 JSON 中的所有元素?

I believe I was able to solve this using this:我相信我能够使用这个解决这个问题:

val withEventTime = df.withColumn("event_time",to_timestamp(col("value. arrivalTime")))

Not sure why this worked & not the other one.不知道为什么这有效而不是另一个。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 从Kafka JSON编码问题读取Spark结构化流 - Spark structured streaming read from kafka json encoding issue 从Kafka读取数据时,Spark流是否单独处理每个JSON“事件”? - Does Spark streaming process every JSON “event” individually when reading from Kafka? 如何使用 spark-sql 从 kafka 读取和处理 JSON 事件? - How can I use spark-sql to read and process a JSON event from kafka? Spark Streaming-Java-将Kafka中的JSON插入Cassandra - Spark Streaming - Java - Insert JSON from Kafka into Cassandra 有没有办法修改此代码以让火花流从 json 读取? - Is there a way to modify this code to let spark streaming read from json? 如何使用Spark流将整个json从kafka主题保存到Cassandra表 - how to save entire json from kafka topic to Cassandra table, using Spark streaming 如何使用 (Py)Spark Structured Streaming 定义带有时间戳的 JSON 记录的架构(来自 Kafka)? - 显示 null 值 - How to define schema for JSON records with timestamp (from Kafka) using (Py)Spark Structured Streaming? - null values shown 如何在Spark Streaming中解析Json格式的Kafka消息 - How to parse Json formatted Kafka message in spark streaming 如何在spark streaming中解析动态的json格式的kafka消息 - How to parse dynamic json formatted kafka message in spark streaming 无法读取json文件:使用Java的Spark结构化流 - Not able to read json files: Spark Structured Streaming using java
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM