[英]How to stream data from SQL Table with Apache Spark with Databricks
I am attempting to stream from sql table using the following:我正在尝试使用以下命令从 sql 表中获取 stream:
my_sales = spark.read.jdbc(jdbcUrl, dbo.table)
static = spark.read.format("csv").load(my_sales)
dataSchema = static.schema
I am trying to read in the data from the table with the following:我正在尝试使用以下内容从表中读取数据:
rawdf = (spark.readStream
.format("csv") \
.option("maxFilesPerTrigger", 1) \
.schema(dataSchema) \
.csv(dataPath)
)
I am using the following to write the data to the following location我正在使用以下内容将数据写入以下位置
saveloc = '/mnt/raw/streaminglocation/'
streamingQuery = (
rawdf
.writeStream
.format("csv")
.outputMode("append")
.option("checkpointLocation", f"{saveloc}/_checkpoints")
.option("mergeSchema", "true")
.start(saveloc)
)
However this failing.然而这失败了。
Is it possible to stream from a SQL table?是否可以从 SQL 表中获取 stream?
This is not possible.这是不可能的。 JDBC sources are not supported for Spark Structured Streaming.
JDBC 源不支持 Spark 结构化流。
Not convinced of the upfront coding either.也不相信前期编码。
Use CDC with Kafka, or materialized updateable views with CDC with KAFKA, or Debezium.将 CDC 与 Kafka 一起使用,或将物化可更新视图与 KAFKA 或 Debezium 一起使用。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.