如何使用 Databricks 的 Apache Spark 从 SQL 表中获取 stream 数据

Question

I am attempting to stream from sql table using the following:我正在尝试使用以下命令从 sql 表中获取 stream：

my_sales =  spark.read.jdbc(jdbcUrl, dbo.table)

static = spark.read.format("csv").load(my_sales)
dataSchema = static.schema

I am trying to read in the data from the table with the following:我正在尝试使用以下内容从表中读取数据：

rawdf = (spark.readStream 
      .format("csv") \
      .option("maxFilesPerTrigger", 1) \
      .schema(dataSchema) \
      .csv(dataPath)
           )

I am using the following to write the data to the following location我正在使用以下内容将数据写入以下位置

saveloc = '/mnt/raw/streaminglocation/'


streamingQuery = (
  rawdf
  .writeStream
  .format("csv")
  .outputMode("append")
  .option("checkpointLocation", f"{saveloc}/_checkpoints")
  .option("mergeSchema", "true")
  .start(saveloc)
)

However this failing.然而这失败了。

Is it possible to stream from a SQL table?是否可以从 SQL 表中获取 stream？

Answer 1

This is not possible.这是不可能的。 JDBC sources are not supported for Spark Structured Streaming. JDBC 源不支持 Spark 结构化流。

Not convinced of the upfront coding either.也不相信前期编码。

Use CDC with Kafka, or materialized updateable views with CDC with KAFKA, or Debezium.将 CDC 与 Kafka 一起使用，或将物化可更新视图与 KAFKA 或 Debezium 一起使用。

如何使用 Databricks 的 Apache Spark 从 SQL 表中获取 stream 数据

问题描述

1 个解决方案

解决方案1
1 已采纳 2022-02-26 10:31:54

如何使用 Databricks 的 Apache Spark 从 SQL 表中获取 stream 数据

问题描述

1 个解决方案

解决方案1 1 已采纳 2022-02-26 10:31:54

解决方案1
1 已采纳 2022-02-26 10:31:54