[英]read a data from kafka topic and aggregate using spark tempview?
i want a read a data from kafka topic, and create spark tempview to group by some columns? 我想从kafka主题中读取数据,并创建spark tempview以按某些列分组?
+----+--------------------+
| key| value|
+----+--------------------+
|null|{"e":"trade","E":...|
|null|{"e":"trade","E":...|
|null|{"e":"trade","E":...|
but i can't able to aggregate data from tempview?? 但我无法从tempview聚合数据? value column data stored as a String???
值列数据存储为字符串???
Dataset<Row> data = spark
.readStream()
.format("kafka")
.option("kafka.bootstrap.servers", "localhost:9092,localhost:9093")
.option("subscribe", "data2-topic")
.option("startingOffsets", "latest")
.option ("group.id", "test")
.option("enable.auto.commit", "true")
.option("auto.commit.interval.ms", "1000")
.load();
data.selectExpr("CAST(key AS STRING)", "CAST(value AS STRING)");
data.createOrReplaceTempView("Tempdata");
data.show();
Dataset<Row> df2=spark.sql("SELECT e FROM Tempdata group by e");
df2.show();
value column data stored as a String???
值列数据存储为字符串???
Yes.. Because you CAST(value as STRING)
是的。因为您的
CAST(value as STRING)
You'll want to use a from_json
function that'll load the row into a proper dataframe that you can search within. 您将需要使用
from_json
函数,该函数会将行加载到可以在其中搜索的适当数据from_json
。
See Databrick's blog on Structured Streaming on Kafka for some examples 有关一些示例,请参见Databrick的关于Kafka上的结构化流的博客。
If the primary goal is just grouping of some fields, then KSQL might be an alternative. 如果主要目标只是对某些字段进行分组,则可以选择KSQL。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.