从kafka主题中读取数据并使用spark tempview进行汇总？

Question

i want a read a data from kafka topic, and create spark tempview to group by some columns? 我想从kafka主题中读取数据，并创建spark tempview以按某些列分组？

+----+--------------------+
| key|               value|          
+----+--------------------+
|null|{"e":"trade","E":...|
|null|{"e":"trade","E":...|
|null|{"e":"trade","E":...|

but i can't able to aggregate data from tempview?? 但我无法从tempview聚合数据？ value column data stored as a String??? 值列数据存储为字符串？？？

Dataset<Row> data = spark
                  .readStream()
                  .format("kafka")
                  .option("kafka.bootstrap.servers", "localhost:9092,localhost:9093")
                  .option("subscribe", "data2-topic")
                  .option("startingOffsets", "latest")
                  .option ("group.id", "test")
                  .option("enable.auto.commit", "true")
                  .option("auto.commit.interval.ms", "1000")          
                  .load();
          data.selectExpr("CAST(key AS STRING)", "CAST(value AS STRING)");
          data.createOrReplaceTempView("Tempdata");
          data.show();
Dataset<Row> df2=spark.sql("SELECT e FROM Tempdata group by e");
df2.show();

Answer 1

value column data stored as a String??? 值列数据存储为字符串？？？

Yes.. Because you CAST(value as STRING) 是的。因为您的CAST(value as STRING)

You'll want to use a from_json function that'll load the row into a proper dataframe that you can search within. 您将需要使用from_json函数，该函数会将行加载到可以在其中搜索的适当数据from_json 。

See Databrick's blog on Structured Streaming on Kafka for some examples 有关一些示例，请参见Databrick的关于Kafka上的结构化流的博客。

If the primary goal is just grouping of some fields, then KSQL might be an alternative. 如果主要目标只是对某些字段进行分组，则可以选择KSQL。

从kafka主题中读取数据并使用spark tempview进行汇总？

问题描述

1 个解决方案

解决方案1
0 2018-12-05 07:50:53

从kafka主题中读取数据并使用spark tempview进行汇总？

问题描述

1 个解决方案

解决方案1 0 2018-12-05 07:50:53

解决方案1
0 2018-12-05 07:50:53