简体   繁体   中英

How to get the globally declared MapState value in RichCoMapFunction [ Apache Flink ]?

I'm implementing the Flink datastream for some real time data calculation. So that i'm getting datastream value from two type of source. And i need to do some transformation based on some key. When i'm using RichCoMapFunction, Mapstate is not visible to globally. My program as follows

 class Transformer extends RichCoMapFunction[(String, Map[String, String]), (String, Map[String, String]), Map[String, String]] {

private var sourceMap1: MapState[String, Map[String, String]] = _

private var sourceMap2: MapState[String, Map[String, String]] = _

override def map1(in1: (String, Map[String, String])): Map[String, String] = {
  sourceMap1.put(in1._2("key"), in1._2)     
  println(sourceMap1.keys())  // Working with updated values
  println(sourceMap2.keys())  // Return empty value always
  return in1._2
}

override def map2(in2: (String, Map[String, String])): Map[String, String] = {
  sourceMap2.put(in2._2("key"), in2._2)
  println(sourceMap1.keys()) // Return empty value always
  println(sourceMap2.keys()) // Working with updated values
  return in2._2
}

override def open(parameters: Configuration): Unit = {
  val desc1: MapStateDescriptor[String, Map[String, String]] = new MapStateDescriptor[String, Map[String, String]]("sourceMap1", classOf[String], classOf[Map[String, String]])
  sourceMap1 = getRuntimeContext.getMapState(desc1)
  val desc2: MapStateDescriptor[String, Map[String, String]] = new MapStateDescriptor[String, Map[String, String]]("sourceMap2", classOf[String], classOf[Map[String, String]])
  sourceMap2 = getRuntimeContext.getMapState(desc2)

}
 }

I need to access sourceMap2 in map1 function since its declared as global. But when i'm trying to print the keys of sourceMap2 in map1 function it's always return as empty value. But if i'm printing the sourceMap1 in map1 function means it will print all the added keys.

When using keyed state, Flink will store a separate state value for each key value. This means that if you have a stateful mapper m with state s and you process records (x1, y1) and (x2, y2) where x is the key, Flink will store s(x1) = (x1, v1) and s(x2) = (x2, v2) in its state backend.

When processing (x2, y2) , then you only have access to s(x2) and it is not possible to access s(x1) .

I assume that this is the reason why you see presumably empty MapState . The incoming records for map1 and map2 will have different keys and, therefore, you access the sourceMap2 in map1 for a key (not the map key but the keyBy key) for which no key-value pairs have been stored. The same applies to map2 where you access sourceMap1 under a key for which no key-value pairs have been stored yet.

Your Transformer class is being applied to two connected, keyed streams. sourceMap1 and sourceMap2 are keyed state , meaning that you have a separate, nested hash map for every key of the two connected streams. One pair of these maps is in scope each time map1 or map2 is called, ie, the pair corresponding to the key of the item being mapped.

If instead you want to have global state, shared across all the keys, have a look at the broadcast state pattern .

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM