简体   繁体   English

Akka Streams:状态流畅

[英]Akka Streams: State in a flow

I want to read multiple big files using Akka Streams to process each line. 我想使用Akka Streams读取多个大文件来处理每一行。 Imagine that each key consists of an (identifier -> value) . 想象一下,每个键都包含一个(identifier -> value) If a new identifier is found, I want to save it and its value in the database; 如果找到新的标识符,我想将它及其值保存在数据库中; otherwise, if the identifier has already been found while processing the stream of lines, I want to save only the value. 否则,如果在处理行流时已找到标识符,我只想保存该值。 For that, I think that I need some kind of recursive stateful flow in order to keep the identifiers that have already been found in a Map . 为此,我认为我需要某种递归的有状态流,以便保留已经在Map找到的标识符。 I think I'd receive in this flow a pair of (newLine, contextWithIdentifiers) . 我想我会在这个流程中收到一对(newLine, contextWithIdentifiers)

I've just started to look into Akka Streams. 我刚刚开始研究Akka Streams。 I guess I can manage myself to do the stateless processing stuff but I have no clue about how to keep the contextWithIdentifiers . 我想我可以管理自己做无状态处理的东西,但我不知道如何保持contextWithIdentifiers I'd appreciate any pointers to the right direction. 我很欣赏指向正确方向的任何指示。

Maybe something like statefulMapConcat can help you: 也许像statefulMapConcat这样的东西可以帮助你:

import akka.actor.ActorSystem
import akka.stream.ActorMaterializer
import akka.stream.scaladsl.{Sink, Source}
import scala.util.Random._
import scala.math.abs
import scala.concurrent.ExecutionContext.Implicits.global

implicit val system = ActorSystem()
implicit val materializer = ActorMaterializer()

//encapsulating your input
case class IdentValue(id: Int, value: String)
//some random generated input
val identValues = List.fill(20)(IdentValue(abs(nextInt()) % 5, "valueHere"))

val stateFlow = Flow[IdentValue].statefulMapConcat{ () =>
  //state with already processed ids
  var ids = Set.empty[Int]
  identValue => if (ids.contains(identValue.id)) {
    //save value to DB
    println(identValue.value)
    List(identValue)
  } else {
    //save both to database
    println(identValue)
    ids = ids + identValue.id
    List(identValue)
  }
}

Source(identValues)
  .via(stateFlow)
  .runWith(Sink.seq)
  .onSuccess { case identValue => println(identValue) }

A few years later, here is an implementation I wrote if you only need a 1-to-1 mapping (not 1-to-N): 几年后,如果你只需要一对一的映射(不是1对N),这里是我写的一个实现:

import akka.stream.stage.{GraphStage, GraphStageLogic}
import akka.stream.{Attributes, FlowShape, Inlet, Outlet}

object StatefulMap {
  def apply[T, O](converter: => T => O) = new StatefulMap[T, O](converter)
}

class StatefulMap[T, O](converter: => T => O) extends GraphStage[FlowShape[T, O]] {
  val in = Inlet[T]("StatefulMap.in")
  val out = Outlet[O]("StatefulMap.out")
  val shape = FlowShape.of(in, out)

  override def createLogic(inheritedAttributes: Attributes): GraphStageLogic = new GraphStageLogic(shape) {
    val f = converter
    setHandler(in, () => push(out, f(grab(in))))
    setHandler(out, () => pull(in))
  }
}

Test (and demo): 测试(和演示):

  behavior of "StatefulMap"

  class Counter extends (Any => Int) {
    var count = 0

    override def apply(x: Any): Int = {
      count += 1
      count
    }
  }

  it should "not share state among substreams" in {
    val result = await {
      Source(0 until 10)
        .groupBy(2, _ % 2)
        .via(StatefulMap(new Counter()))
        .fold(Seq.empty[Int])(_ :+ _)
        .mergeSubstreams
        .runWith(Sink.seq)
    }
    result.foreach(_ should be(1 to 5))
  }

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM