简体   繁体   English

如何在Kafka Direct Stream中使用Spark Structured Streaming?

[英]How to use Spark Structured Streaming with Kafka Direct Stream?

I came across Structured Streaming with Spark , it has an example of continuously consuming from an S3 bucket and writing processed results to a MySQL DB. 我遇到了使用Spark的Structured Streaming ,它有一个连续消耗S3存储桶并将处理结果写入MySQL数据库的示例。

// Read data continuously from an S3 location
val inputDF = spark.readStream.json("s3://logs")

// Do operations using the standard DataFrame API and write to MySQL
inputDF.groupBy($"action", window($"time", "1 hour")).count()
       .writeStream.format("jdbc")
       .start("jdbc:mysql//...")

How can this be used with Spark Kafka Streaming ? 如何在Spark Kafka Streaming中使用它

val stream = KafkaUtils.createDirectStream[String, String](
  ssc,
  PreferConsistent,
  Subscribe[String, String](topics, kafkaParams)
)

Is there a way to combine these two examples without using stream.foreachRDD(rdd => {}) ? 有没有办法在不使用stream.foreachRDD(rdd => {})情况下组合这两个示例?

Is there a way to combine these two examples without using stream.foreachRDD(rdd => {}) ? 有没有办法在不使用stream.foreachRDD(rdd => {})情况下组合这两个示例?

Not yet. 还没。 Spark 2.0.0 doesn't have Kafka sink support for Structured Streaming. Spark 2.0.0没有Kafka sink支持结构化流。 This is a feature that should come out in Spark 2.1.0 according to Tathagata Das , one of the creators of Spark Streaming. 根据Tathagata Das ,Spark Streaming的创建者之一,这个功能应该在Spark 2.1.0中出现 Here is the relevant JIRA issue . 以下是相关的JIRA问题

Edit: (29/11/2018) 编辑:(29/11/2018)

Yes, It's possible with Spark version 2.2 onwards. 是的,可以使用Spark 2.2版开始。

stream
  .writeStream // use `write` for batch, like DataFrame
  .format("kafka")
  .option("kafka.bootstrap.servers", "brokerhost1:port1,brokerhost2:port2")
  .option("topic", "target-topic1")
  .start()

Check this SO post(read and write on Kafka topic with Spark streaming) for more. 查看此SO帖子(使用Spark流媒体读取和写入Kafka主题)了解更多信息。

Edit: (06/12/2016) 编辑:(2016年12月6日)

Kafka 0.10 integration for Structured Streaming is now expiramentaly supported in Spark 2.0.2 : 结构化流的Kafka 0.10集成现在在Spark 2.0.2中支持expiramentaly

val ds1 = spark
  .readStream
  .format("kafka")
  .option("kafka.bootstrap.servers", "host1:port1,host2:port2")
  .option("subscribe", "topic1")
  .load()

ds1
  .selectExpr("CAST(key AS STRING)", "CAST(value AS STRING)")
  .as[(String, String)]

I was having a similar issue wrt reading from Kafka source and writing to a Cassandra sink. 我从Kafka的源代码中读到了一个类似的问题并写入了Cassandra接收器。 Created a simple project here kafka2spark2cassandra , sharing in case it could be helpful for anyone. 在这里创建了一个简单的项目kafka2spark2cassandra ,分享以防万一它对任何人都有帮助。

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何在 IntelliJ IDEA 中使用 Kafka Direct Stream 运行 Spark Streaming 应用程序? - How to run Spark Streaming application with Kafka Direct Stream in IntelliJ IDEA? 如何使用 Spark Structured Streaming 将数据从 Kafka 主题流式传输到 Delta 表 - How to stream data from Kafka topic to Delta table using Spark Structured Streaming Kafka protobuf 的 Spark 结构化流 - Spark structured streaming of Kafka protobuf 在使用 Kafka 的 Spark Structured streaming 中,Spark 如何管理多个主题的偏移量 - In Spark Structured streaming with Kafka, how spark manages offset for multiple topics 如何在Spark结构化流中使用Scala Case类映射Kafka源 - How to use Scala Case Class to map Kafka source in Spark Structured Streaming 如何使用from_json与Kafka connect 0.10和Spark Structured Streaming? - How to use from_json with Kafka connect 0.10 and Spark Structured Streaming? 如何将 Kafka 与 Spark Structured Streaming 与 MongoDB Sink 集成 - How to Integrate Kafka with Spark Structured Streaming with MongoDB Sink Spark 结构化流 - 如何将字节值排队到 Kafka? - Spark structured streaming - how to queue bytes value to Kafka? 如何在火花结构化流中将kafka时间戳值包含为列? - How to include kafka timestamp value as columns in spark structured streaming? 如何在 spark 3.0 结构化流媒体中使用 kafka.group.id 和检查点以继续从 Kafka 中读取它在重启后停止的位置? - How to use kafka.group.id and checkpoints in spark 3.0 structured streaming to continue to read from Kafka where it left off after restart?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM