简体   繁体   English

过滤Flink元组

[英]Filter Flink tuples

I'm writing a program for stream processing in Scala using Flink. 我正在编写一个使用Flink在Scala中进行流处理的程序。 I have a datastream which I first map to tuples containg json4s JValues. 我有一个数据流,我首先将其映射到包含json4s JValues的元组。 Now I want to filter these tuples based on these JValues. 现在,我想基于这些JValue过滤这些元组。 I thought this would be simple but I can't find any good example of how to filter Flink tuples by their columns. 我以为这很简单,但是找不到如何通过Flink元组的列过滤的好例子。 Does anyone know how to do this? 有谁知道如何做到这一点? Thanks 谢谢

Instead of mapping to tuples, you could simply map to case classes and filter out unneeded stuff: 除了映射到元组,您还可以映射到案例类并过滤掉不需要的内容:

// StreamingJob.scala

...

val filteredEvents = content
      .map(x => Event.toCaseClass(x))
      .filter(x => x.value == true)

...

// Event.scala

case class Event(
                  id: String,
                  value: Int,
                )
object Event {
  implicit val formats = DefaultFormats

  def toCaseClass(str: String) =
    parse(str).extract[Event]
}

The question seems a little too undefined for me but maybe, does this not work? 这个问题对我来说似乎太不确定了,但是也许不行吗?

// stream contains stuff like these in a flink tuple 
//(custom deserializer of array to tuple2???)
val jsonExample = """["foo", "bar"]"""

val stream: DataStream[Tuple2[JString, JString]] = ???
val filteredStream = stream.filter(x => x.getField(0).extract[String] == "foo")

Id say it would be better not to use flink tuples if you are writing scala though. 同上,如果您正在编写scala,最好不要使用flink元组。 Go for case classes or at least scala tuples maybe? 去案例类或至少scala元组?

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM