[英]Read Data from kafka topic into spark dataframe
private static final org.apache.log4j.Logger LOGGER = org.apache.log4j.Logger.getLogger(sparkSqlMysql.class);
private static final SparkSession sparkSession = SparkSession.builder().master("local[*]").appName("Spark2JdbcDs")
.getOrCreate();
public static void main(String[] args) {
// JDBC connection properties
// Load MySQL query result as Dataset
Dataset<Row> df = sparkSession.readStream().format("kafka").option("kafka.bootstrap.servers", "localhost:9092")
.option("subscribe", "SqlMessages").load();
I want to do something where I can read data from my spark SQL from my kafka topic but not able to do so.我想做一些事情,我可以从我的 kafka 主题中读取我的 spark SQL 的数据,但不能这样做。
Can someone guide who I can convert my data from kafka Topic to spark SQL?有人可以指导我可以将我的数据从 kafka 主题转换为 spark SQL 吗?
Something where I can do this我可以做到这一点的东西
Dataset<Row> schoolData = sparkSession.sql("select * from Schools");
Was doing something similar today.今天也在做类似的事情。 Consumed entire topic from beginning, converted to DataFrame and Saved as Parquet table.
从头开始消耗整个主题,转换为 DataFrame 并保存为 Parquet 表。 You can adapt my code from Scala, idea should be clear.
您可以从 Scala 改编我的代码,思路应该很清楚。
val topic = "topic_bla_bla"
val brokers = "some_kafka_broker:9092"
val kafkaDF = spark.read.format("kafka").option("kafkaConsumer.pollTimeoutMs", "20000").option("startingOffsets", "earliest").option("kafka.bootstrap.servers", brokers).option("subscribe", topic).load()
val jsonDF = kafkaDF.selectExpr("CAST(value AS STRING)")
val finalDF = spark.read.option("mode", "PERMISSIVE").json(jsonDF.as[String])
finalDF.registerTempTable("wow_table")
//OR
finalDF.write.format("parquet").saveAsTable("default.wow_table")
spark.sql("select * from wow_table")
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.