简体繁体中英

How reliable is spark stream join with static databricks delta table

原文 2022-04-19 19:30:02 8 1 apache-spark/ databricks/ spark-structured-streaming/ delta-lake

In the databricks there is a cool feature that allows to join a streaming dataframe with a delta table. The cool part is that changes in the delta table are still reflected for a subsequent join results. It works just fine, but I'm curious to know how this works, and what are the limitations here? eg what's the expected update delay? How it changes as the delta table grows? Is it safe to rely on it in production?

1 answers

Yes, you can rely on this feature (it's really of Spark) - many customers are using it in production. Regarding the other questions - there are multiple aspects here, depending on factors, like, how often table updates, etc.:

Because static Delta table isn't cached it's re-read on each join - depending on the cluster configuration, it may not be very bad if you use Delta Caching , so files aren't re-downloaded every time, only new data will be re-downloaded.
Read performance could be affected if you have a lot of small files, etc. - it depends on how you're writing into that table & if you do things like OPTIMIZE.
Depending on how often the Delta table is updated, you can cache it & periodically refresh it

But really to answer it completely, you need to provide more information specific to your code, use case, etc.

How to get a snapshot of a streaming delta table as a static table in Databricks?

Delta Table to Spark Streaming to Synapse Table in azure databricks

How to parallelly merge data into partitions of databricks delta table using PySpark/Spark streaming?

How to drop a column from a Databricks Delta table?

Accessing Delta Lake Table in Databricks via Spark in MLflow project

Convert spark dataframe to Delta table on azure databricks - warning

How to stream data from Kafka topic to Delta table using Spark Structured Streaming

Databricks Delta and Hive Transactional Table

Databricks Delta Table Schema mismatch

Databricks - How to get the current version of delta table parquet files

暂无

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

Related Question How to get a snapshot of a streaming delta table as a static table in Databricks? Delta Table to Spark Streaming to Synapse Table in azure databricks How to parallelly merge data into partitions of databricks delta table using PySpark/Spark streaming? How to drop a column from a Databricks Delta table? Accessing Delta Lake Table in Databricks via Spark in MLflow project Convert spark dataframe to Delta table on azure databricks - warning How to stream data from Kafka topic to Delta table using Spark Structured Streaming Databricks Delta and Hive Transactional Table Databricks Delta Table Schema mismatch Databricks - How to get the current version of delta table parquet files

Related Tags

How reliable is spark stream join with static databricks delta table

Question

1 answers

solution1 1 ACCPTED 2022-04-20 12:48:29

solution1
1 ACCPTED 2022-04-20 12:48:29