[英]Spark2 Kafka Structured Streaming Java doesn't know from_json function
I've got a question with regards to Spark structured streaming on a Kafka stream. 关于Kafka流上的Spark结构化流,我有一个问题。
I have a schema of type: 我有一个类型的架构:
StructType schema = new StructType()
.add("field1", StringType)
.add("field2", StringType)
.add("field3", StringType)
.add("field4", StringType)
.add("field5", StringType);
I bootstrap my stream from Kafka topic like: 我从卡夫卡主题引导我的流,例如:
Dataset<Row> ds1 = spark
.readStream()
.format("kafka")
.option("kafka.bootstrap.servers", "brokerlist")
.option("zookeeper.connect", "zk_url")
.option("subscribe", "topic")
.option("startingOffsets", "earliest")
.option("max.poll.records", 10)
.option("failOnDataLoss", false)
.load();
next convert to string,string type: 接下来转换为字符串,字符串类型:
Dataset<Row> df1 = ds1.selectExpr("CAST(key AS STRING)", "CAST(value AS STRING)");
Now I would like to convert the value field (which is a JSON) to the previously converted schema which should make SQL queries easier: 现在,我想将value字段(这是一个JSON)转换为先前转换的架构,这将使SQL查询更加容易:
Dataset<Row> df2 = df1.select(from_json("value", schema=schema).as("data").select("single_column_field");
It seems that Spark 2.3.1 doesn't know the from_json
function? 似乎Spark 2.3.1不知道from_json
函数?
This is my imports: 这是我的进口:
import org.apache.spark.sql.Dataset;
import org.apache.spark.sql.Row;
import org.apache.spark.sql.SparkSession;
import org.apache.spark.sql.streaming.OutputMode;
import org.apache.spark.sql.streaming.StreamingQueryException;
import org.apache.spark.sql.types.StructType;
Any idea on how to solve this? 关于如何解决这个问题的任何想法? Please note that I'm not looking for a Scala solution, but a pure Java based solution! 请注意,我不是在寻找Scala解决方案,而是在寻找纯粹的基于Java的解决方案!
This code working for me. 这段代码对我有用。 Hope it'll helpful 希望对您有所帮助
val df = spark
.readStream
.format("kafka")
.option("kafka.bootstrap.servers", "192.168.34.216:9092")
.option("subscribe", "topicName")
.load()
//df.show();
import spark.implicits._
val comingXDR = df.select("value").as[String].withColumn("_tmp", split($"value", "\\,")).withColumn("MyNewColumnName1", $"_tmp".getItem(0)).withColumn("MyNewColumnName2", $"_tmp".getItem(1)).withColumn("MyNewColumnName3", $"_tmp".getItem(2)).withColumn("MyNewColumnName4", $"_tmp".getItem(3)).drop("value").drop("_tmp")
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.