简体   繁体   中英

How to properly initialize task state at Apache Flink?

I am working on financial anti-fraud system, based on Apache Flink. I need to calculate many different aggregates, based on financial transactions. I use Kafka as stream data source. For example, in average transaction amount calculation I use MapState for storing total transactions count and total amount per card. Aggregated data stored at Apache Accumulo. I know about persistent states in Flink, but it is not that i need. Is there any way to load initial data into Flink before computation begins? Can it be done by using two connected streams with data from Accumulo with latest computed aggregates and transactions stream? Transactions stream is infinite, by aggregates stream not. Which way should i dig to? Any help is appreciated.

I've thought about AsyncIO, but states can't be used with async functions. My idea was: check for aggregates at in-memory state. If there is no data for card here - code makes call to storage service, fetch data from it, performs computations and updates in-memory state, so, next transaction for that card don't need to be processed with call to external data service. But i think its a big bottleneck.

You could try this way:

TASK::setInitialState
    TASK::invoke
        create basic utils (config, etc) and load the chain of operators
        setup-operators
        task-specific-init
        initialize-operator-states
        open-operators
        run
        close-operators
        dispose-operators
        task-specific-cleanup
        common-cleanup

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM