What is the proper way of having an in-memory LRU cache in a scala application that runs over spark structured streaming that stays persisted across batches.
I tried using the Guava cache but I think because it is not serializable even though I use it as a singleton, a new cache gets instantiated with every micro-batch.
In order to process events I need to lookup some metadata in an external data source and so I want to avoid going over the network for every call and instead cache them locally for a certain amount of time.
You could try writing your own logic inside of mapGroupsWithState or flatMapGroupsWithState
This is a stateful store that can hold computed values for lookup
See this link: https://databricks.com/blog/2017/10/17/arbitrary-stateful-processing-in-apache-sparks-structured-streaming.html
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.