We are using Flink in Scala to route and transform Protobuf events (using scalapb) in our analytics pipeline. I have a datastream of "PlayStreams" with this schema:
message PlayStream {
optional PlayerEvent player_event = 1;
optional BlockAccountIPEvent block_account_ip_event = 2;
}
The resulting generated case class has a playerEvent
member of type signature Option[PlayerEvent]
.
I want to transform the datastream into just PlayerEvents, filtering out any that do not have them. I'm new to Scala, so I'm not sure how to do this idiomatically. What I have currently works fine:
// in main()
getDataStream(name, env, config.get("KafkaSource"))
.keyBy[String](PlayStreamFunctions.key(_))
.map{ _.getPlayerEvent }
.filter(filterDefaultPlayerEvents(_))
def filterDefaultPlayerEvents(playerEvent: PlayerEvent): Boolean = {
playerEvent match {
case PlayerEvent.defaultInstance => false
case _ => true
}
}
This works because getPlayerEvent
in the generated class is just playerEvent.getOrElse(PlayerEvent.defaultInstance)
, and we don't use the default instance for anything. However, It feels weird creating a bunch of references to the defaultInstance only to immediately filter them out in the next step. Is there a way avoid that, that I'm not seeing?
Wanted to clarify that I scoped this question under Flink since all the map functions are Flink-specific implementations. I realized flatMap
was available and, given that map operations are more idiomatic than pattern matching for Options , I went with this implementation:
getDataStream(name, env, config.get("KafkaSource"))
.keyBy[String](PlayStreamFunctions.key(_))
.flatMap{ _.playerEvent.toList }
.flatMap(toFlatPlayerEvent(_))
since toList
returns an empty list if the Option doesn't exist, or a unary list with the value if it does, flat mapping across them solves my problem.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.