简体   繁体   English

使用 Apache Flink 从 Kafka 主题消费,然后使用 Flink CEP 处理 stream

[英]Using Apache Flink to consume from a Kafka topic then processing the stream with Flink CEP

In this project, I'm trying to consume data from a Kafka topic using Flink and then process the stream to detect a pattern using Flink CEP.在这个项目中,我尝试使用 Flink 使用来自 Kafka 主题的数据,然后处理 stream 以使用 Flink CEP 检测模式。 The part of using Kafka connect works and data is being fetched, but the CEP part doesn't work for some reason.使用 Kafka 连接的部分工作并正在获取数据,但 CEP 部分由于某种原因不起作用。 I'm using scala in this project.我在这个项目中使用 scala。

build.sbt:构建.sbt:


version := "0.1"

scalaVersion := "2.11.12"

libraryDependencies += "org.apache.flink" %% "flink-streaming-scala" % "1.12.2"

libraryDependencies += "org.apache.kafka" %% "kafka" % "2.3.0"

libraryDependencies += "org.apache.flink" %% "flink-connector-kafka" % "1.12.2"


libraryDependencies += "org.apache.flink" %% "flink-cep-scala" % "1.12.2" 

the main code:主要代码:

import org.apache.flink.api.common.serialization.SimpleStringSchema

import java.util
import java.util.Properties
import org.apache.flink.cep.PatternSelectFunction
import org.apache.flink.cep.scala.CEP
import org.apache.flink.streaming.api.scala._
import org.apache.flink.cep.scala.pattern.Pattern
import org.apache.flink.streaming.api.scala.{DataStream, StreamExecutionEnvironment}
import org.apache.flink.streaming.connectors.kafka.FlinkKafkaConsumer
import org.apache.flink.cep.pattern.conditions.IterativeCondition

object flinkExample {
  def main(args: Array[String]): Unit = {


    val CLOSE_THRESHOLD: Double = 140.00

    val properties = new Properties()
    properties.setProperty("bootstrap.servers", "localhost:9092")
    properties.setProperty("zookeeper.connect", "localhost:2181")
    properties.setProperty("group.id", "test")

    val consumer = new FlinkKafkaConsumer[String]("test", new SimpleStringSchema(), properties)
    consumer.setStartFromEarliest




    val see: StreamExecutionEnvironment = StreamExecutionEnvironment.getExecutionEnvironment


    val src: DataStream[String] = see.addSource(consumer)

    
    val keyedStream: DataStream[Stock] = src.map(v => v)
      .map {
        v =>
          val data = v.split(":")

          val date = data(0)
          val close = data(1).toDouble
          Stock(date,close)
      }

    val pat = Pattern
      .begin[Stock]("start")
      .where(_.Adj_Close > CLOSE_THRESHOLD)


    val patternStream = CEP.pattern(keyedStream, pat)

    val result = patternStream.select(
      patternSelectFunction = new PatternSelectFunction[Stock, String]() {
        override def select(pattern: util.Map[String, util.List[Stock]]): String = {
          val data = pattern.get("first").get(0)

          data.toString
        }
      }
    )

    result.print()

    see.execute("ASK Flink Kafka")

  }

  case class Stock(date: String,
                   Adj_Close: Double)
  {
    override def toString: String = s"Stock date: $date, Adj Close: $Adj_Close"
  }

}

Data coming from Kafka are in string format: "date:value"来自 Kafka 的数据采用字符串格式:“日期:值”

Scala version: 2.11.12 Flink version: 1.12.2 Kafka version: 2.3.0 Scala 版本:2.11.12 Flink 版本:1.12.2 Kafka 版本:2.3.0

I'm building the project using: sbt assembly, and then deploy the jar in the flink dashboard.我正在使用:sbt 程序集构建项目,然后在 flink 仪表板中部署 jar。

With pattern.get("first") you are selecting a pattern named "first" from the pattern sequence, but the pattern sequence only has one pattern, which is named "start".使用pattern.get("first")您正在从模式序列中选择一个名为“first”的模式,但模式序列只有一个模式,名为“start”。 Trying changing "first" to "start".尝试将“第一”更改为“开始”。

Also, CEP has to be able to sort the stream into temporal order in order to do pattern matching.此外,CEP 必须能够按时间顺序对 stream 进行排序,以便进行模式匹配。 You should define a watermark strategy.您应该定义水印策略。 For processing time semantics you can use WatermarkStrategy.noWatermarks() .对于处理时间语义,您可以使用WatermarkStrategy.noWatermarks()

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM