简体   繁体   中英

read a data from kafka topic and aggregate using spark tempview?

i want a read a data from kafka topic, and create spark tempview to group by some columns?

+----+--------------------+
| key|               value|          
+----+--------------------+
|null|{"e":"trade","E":...|
|null|{"e":"trade","E":...|
|null|{"e":"trade","E":...|

but i can't able to aggregate data from tempview?? value column data stored as a String???

Dataset<Row> data = spark
                  .readStream()
                  .format("kafka")
                  .option("kafka.bootstrap.servers", "localhost:9092,localhost:9093")
                  .option("subscribe", "data2-topic")
                  .option("startingOffsets", "latest")
                  .option ("group.id", "test")
                  .option("enable.auto.commit", "true")
                  .option("auto.commit.interval.ms", "1000")          
                  .load();
          data.selectExpr("CAST(key AS STRING)", "CAST(value AS STRING)");
          data.createOrReplaceTempView("Tempdata");
          data.show();
Dataset<Row> df2=spark.sql("SELECT e FROM Tempdata group by e");
df2.show();

value column data stored as a String???

Yes.. Because you CAST(value as STRING)

You'll want to use a from_json function that'll load the row into a proper dataframe that you can search within.

See Databrick's blog on Structured Streaming on Kafka for some examples

If the primary goal is just grouping of some fields, then KSQL might be an alternative.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM