简体   繁体   中英

In memory cache persisted between batches spark structured streaming

What is the proper way of having an in-memory LRU cache in a scala application that runs over spark structured streaming that stays persisted across batches.

I tried using the Guava cache but I think because it is not serializable even though I use it as a singleton, a new cache gets instantiated with every micro-batch.

In order to process events I need to lookup some metadata in an external data source and so I want to avoid going over the network for every call and instead cache them locally for a certain amount of time.

You could try writing your own logic inside of mapGroupsWithState or flatMapGroupsWithState

This is a stateful store that can hold computed values for lookup

See this link: https://databricks.com/blog/2017/10/17/arbitrary-stateful-processing-in-apache-sparks-structured-streaming.html

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM