简体繁体中英

In memory cache persisted between batches spark structured streaming

原文 2019-03-10 21:14:22 2 1 scala/ caching/ guava/ spark-structured-streaming

What is the proper way of having an in-memory LRU cache in a scala application that runs over spark structured streaming that stays persisted across batches.

I tried using the Guava cache but I think because it is not serializable even though I use it as a singleton, a new cache gets instantiated with every micro-batch.

In order to process events I need to lookup some metadata in an external data source and so I want to avoid going over the network for every call and instead cache them locally for a certain amount of time.

1 answers

You could try writing your own logic inside of mapGroupsWithState or flatMapGroupsWithState

This is a stateful store that can hold computed values for lookup

See this link: https://databricks.com/blog/2017/10/17/arbitrary-stateful-processing-in-apache-sparks-structured-streaming.html

Skipping of batches in spark structured streaming process

Spark Structured Streaming Memory Bound

Spark streaming: Cache DStream results across batches

Spark Streaming states to be persisted to disk in addition to in memory

Spark Structured Streaming with foreach

Spark streaming: un structured records

Window length in Spark Structured Streaming

Structured Spark Streaming multiple writes

Sliding window in Spark Structured Streaming

Parse JSON for Spark Structured Streaming

暂无

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

Related Question Skipping of batches in spark structured streaming process Spark Structured Streaming Memory Bound Spark streaming: Cache DStream results across batches Spark Streaming states to be persisted to disk in addition to in memory Spark Structured Streaming with foreach Spark streaming: un structured records Window length in Spark Structured Streaming Structured Spark Streaming multiple writes Sliding window in Spark Structured Streaming Parse JSON for Spark Structured Streaming

Related Tags

In memory cache persisted between batches spark structured streaming

Question

1 answers

solution1 0 2019-04-18 00:38:42

solution1
0 2019-04-18 00:38:42