简体   繁体   中英

Filtering Flink datastream to optional sub-object in Scala

We are using Flink in Scala to route and transform Protobuf events (using scalapb) in our analytics pipeline. I have a datastream of "PlayStreams" with this schema:

message PlayStream {
  optional PlayerEvent player_event = 1;
  optional BlockAccountIPEvent block_account_ip_event = 2;
}

The resulting generated case class has a playerEvent member of type signature Option[PlayerEvent] .

I want to transform the datastream into just PlayerEvents, filtering out any that do not have them. I'm new to Scala, so I'm not sure how to do this idiomatically. What I have currently works fine:

   // in main()
   getDataStream(name, env, config.get("KafkaSource"))
      .keyBy[String](PlayStreamFunctions.key(_))
      .map{ _.getPlayerEvent }
      .filter(filterDefaultPlayerEvents(_))


  def filterDefaultPlayerEvents(playerEvent: PlayerEvent): Boolean = {
    playerEvent match {
      case PlayerEvent.defaultInstance => false
      case _ => true
    }
  }

This works because getPlayerEvent in the generated class is just playerEvent.getOrElse(PlayerEvent.defaultInstance) , and we don't use the default instance for anything. However, It feels weird creating a bunch of references to the defaultInstance only to immediately filter them out in the next step. Is there a way avoid that, that I'm not seeing?

Wanted to clarify that I scoped this question under Flink since all the map functions are Flink-specific implementations. I realized flatMap was available and, given that map operations are more idiomatic than pattern matching for Options , I went with this implementation:

  getDataStream(name, env, config.get("KafkaSource"))
      .keyBy[String](PlayStreamFunctions.key(_))
      .flatMap{ _.playerEvent.toList }
      .flatMap(toFlatPlayerEvent(_))

since toList returns an empty list if the Option doesn't exist, or a unary list with the value if it does, flat mapping across them solves my problem.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM