简体   繁体   中英

Data consistency problem when flink updates hbase

There come two numbers with the same key in a operator like map. The first number get value by key from hbase, add them and put the new value to hbase through sink(dataStream.write(new HBaseOutputFormat(), 0L)). The second value does the same thing. Is it possible that the second number get value from hbase before the first number's update to hbase? If I chain the operator and sink togather, can I avoid this ploblem? If not, what should I do? Thanks!

What you need is the keyBy function from the DataStream API, or the groupBy in case you use the DataSet API: see Flink documentation . Those functions make sure that one particular key is processed by one particular slot at all times. One slot represents one thread, which means that your two numbers are processed sequentially, even if the parallelism is greater than 1.

Of course you have to make sure that the put operation to HBase is blocking, so you can't use asynchronous ways of interacting with HBase like the BufferedMutator or Async I/O Operators .

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM