I am working on financial anti-fraud system, based on Apache Flink. I need to calculate many different aggregates, based on financial transactions. I use Kafka as stream data source. For example, in average transaction amount calculation I use MapState for storing total transactions count and total amount per card. Aggregated data stored at Apache Accumulo. I know about persistent states in Flink, but it is not that i need. Is there any way to load initial data into Flink before computation begins? Can it be done by using two connected streams with data from Accumulo with latest computed aggregates and transactions stream? Transactions stream is infinite, by aggregates stream not. Which way should i dig to? Any help is appreciated.
I've thought about AsyncIO, but states can't be used with async functions. My idea was: check for aggregates at in-memory state. If there is no data for card here - code makes call to storage service, fetch data from it, performs computations and updates in-memory state, so, next transaction for that card don't need to be processed with call to external data service. But i think its a big bottleneck.
You could try this way:
TASK::setInitialState
TASK::invoke
create basic utils (config, etc) and load the chain of operators
setup-operators
task-specific-init
initialize-operator-states
open-operators
run
close-operators
dispose-operators
task-specific-cleanup
common-cleanup
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.