简体   繁体   English

从kafka主题中读取数据并使用spark tempview进行汇总?

[英]read a data from kafka topic and aggregate using spark tempview?

i want a read a data from kafka topic, and create spark tempview to group by some columns? 我想从kafka主题中读取数据,并创建spark tempview以按某些列分组?

+----+--------------------+
| key|               value|          
+----+--------------------+
|null|{"e":"trade","E":...|
|null|{"e":"trade","E":...|
|null|{"e":"trade","E":...|

but i can't able to aggregate data from tempview?? 但我无法从tempview聚合数据? value column data stored as a String??? 值列数据存储为字符串???

Dataset<Row> data = spark
                  .readStream()
                  .format("kafka")
                  .option("kafka.bootstrap.servers", "localhost:9092,localhost:9093")
                  .option("subscribe", "data2-topic")
                  .option("startingOffsets", "latest")
                  .option ("group.id", "test")
                  .option("enable.auto.commit", "true")
                  .option("auto.commit.interval.ms", "1000")          
                  .load();
          data.selectExpr("CAST(key AS STRING)", "CAST(value AS STRING)");
          data.createOrReplaceTempView("Tempdata");
          data.show();
Dataset<Row> df2=spark.sql("SELECT e FROM Tempdata group by e");
df2.show();

value column data stored as a String??? 值列数据存储为字符串???

Yes.. Because you CAST(value as STRING) 是的。因为您的CAST(value as STRING)

You'll want to use a from_json function that'll load the row into a proper dataframe that you can search within. 您将需要使用from_json函数,该函数会将行加载到可以在其中搜索的适当数据from_json

See Databrick's blog on Structured Streaming on Kafka for some examples 有关一些示例,请参见Databrick的关于Kafka上的结构化流的博客。

If the primary goal is just grouping of some fields, then KSQL might be an alternative. 如果主要目标只是对某些字段进行分组,则可以选择KSQL。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM