简体   繁体   English

从 kafka 主题读取数据到 spark dataframe

[英]Read Data from kafka topic into spark dataframe

private static final org.apache.log4j.Logger LOGGER = org.apache.log4j.Logger.getLogger(sparkSqlMysql.class);

private static final SparkSession sparkSession = SparkSession.builder().master("local[*]").appName("Spark2JdbcDs")
        .getOrCreate();

public static void main(String[] args) {
    // JDBC connection properties


    // Load MySQL query result as Dataset

    Dataset<Row> df = sparkSession.readStream().format("kafka").option("kafka.bootstrap.servers", "localhost:9092")
            .option("subscribe", "SqlMessages").load();

I want to do something where I can read data from my spark SQL from my kafka topic but not able to do so.我想做一些事情,我可以从我的 kafka 主题中读取我的 spark SQL 的数据,但不能这样做。

Can someone guide who I can convert my data from kafka Topic to spark SQL?有人可以指导我可以将我的数据从 kafka 主题转换为 spark SQL 吗?

Something where I can do this我可以做到这一点的东西

 Dataset<Row> schoolData = sparkSession.sql("select * from Schools");

Was doing something similar today.今天也在做类似的事情。 Consumed entire topic from beginning, converted to DataFrame and Saved as Parquet table.从头开始消耗整个主题,转换为 DataFrame 并保存为 Parquet 表。 You can adapt my code from Scala, idea should be clear.您可以从 Scala 改编我的代码,思路应该很清楚。

val topic = "topic_bla_bla"
val brokers = "some_kafka_broker:9092"
val kafkaDF = spark.read.format("kafka").option("kafkaConsumer.pollTimeoutMs", "20000").option("startingOffsets", "earliest").option("kafka.bootstrap.servers", brokers).option("subscribe", topic).load()
val jsonDF = kafkaDF.selectExpr("CAST(value AS STRING)")
val finalDF = spark.read.option("mode", "PERMISSIVE").json(jsonDF.as[String])
finalDF.registerTempTable("wow_table")
//OR
finalDF.write.format("parquet").saveAsTable("default.wow_table")
spark.sql("select * from wow_table")

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 从kafka主题中读取数据并使用spark tempview进行汇总? - read a data from kafka topic and aggregate using spark tempview? 从 Kafka 主题读取数据并使用 scala 和 spark 写回 Kafka 主题 - Read from Kafka topic process the data and write back to Kafka topic using scala and spark 无法使用 spark 读取 kafka 主题数据 - unable to read kafka topic data using spark 如何使用 Spark Streaming 处理从 Kafka Topic 读取的数据帧 - How to process the dataframe which was read from Kafka Topic using Spark Streaming 在 Spark 中,无法使用来自 Kafka 主题的数据 - In Spark, Unable to consume data from Kafka Topic 在 Spark 中读取 Kafka 主题尾部 - Read Kafka topic tail in Spark 带有火花流问题的 Kafka:无法使用现有数据从主题中读取数据 - Kafka with spark streaming issue: Cannot read data from topic with existing data Spark Structured Streaming foreach Sink 自定义编写器无法从 Kafka 主题读取数据 - Spark Structured Streaming foreach Sink custom writer is not able to read data from Kafka topic Spark批处理从多列DataFrame写入Kafka主题 - Spark batch write to Kafka topic from multi-column DataFrame 如何在apache spark中使用来自kafka主题的scala来读取json数据 - How to read json data using scala from kafka topic in apache spark
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM