简体   繁体   English

在 Akka Streams Custom Graph Stage 中包装 Pub-Sub Java API

[英]Wrapping Pub-Sub Java API in Akka Streams Custom Graph Stage

I am working with a Java API from a data vendor providing real time streams.我正在使用来自提供实时流的数据供应商的 Java API。 I would like to process this stream using Akka streams.我想使用 Akka 流处理此流。

The Java API has a pub sub design and roughly works like this: Java API 有一个 pub sub 设计,大致是这样工作的:

Subscription sub = createSubscription();
sub.addListener(new Listener() {
        public void eventsReceived(List events) {
                for (Event e : events)
                        buffer.enqueue(e)
        }
});

I have tried to embed the creation of this subscription and accompanying buffer in a custom graph stage without much success.我试图在自定义图形阶段嵌入此订阅和随附缓冲区的创建,但没有取得太大成功。 Can anyone guide me on the best way to interface with this API using Akka?任何人都可以指导我使用 Akka 与此 API 交互的最佳方式吗? Is Akka Streams the best tool here? Akka Streams 是这里最好的工具吗?

To feed a Source , you don't necessarily need to use a custom graph stage.要提供Source ,您不一定需要使用自定义图形阶段。 Source.queue will materialize as a buffered queue to which you can add elements which will then propagate through the stream. Source.queue将具体化为一个缓冲队列,您可以向其中添加元素,然后这些元素将通过流传播。

There are a couple of tricky things to be aware of.有一些棘手的事情需要注意。 The first is that there's some subtlety around materializing the Source.queue so you can set up the subscription.首先是在实现Source.queue有一些微妙之处,因此您可以设置订阅。 Something like this:像这样的东西:

def bufferSize: Int = ???

Source.fromMaterializer { (mat, att) =>
  val (queue, source) = Source.queue[Event](bufferSize).preMaterialize()(mat)
  val subscription = createSubscription()
  subscription.addListener(
    new Listener() {
      def eventsReceived(events: java.util.List[Event]): Unit = {
        import scala.collection.JavaConverters.iterableAsScalaIterable
        import akka.stream.QueueOfferResult._

        iterableAsScalaIterable(events).foreach { event =>
          queue.offer(event) match {
            case Enqueued => ()  // do nothing
            case Dropped => ??? // handle a dropped pubsub element, might well do nothing
            case QueueClosed => ??? // presumably cancel the subscription...
          }
        }
      }
    }
  )

  source.withAttributes(att)
}

Source.fromMaterializer is used to get access at each materialization to the materializer (which is what compiles the stream definition into actors). Source.fromMaterializer用于在每次物化时访问物化器(将流定义编译为actor)。 When we materialize, we use the materializer to preMaterialize the queue source so we have access to the queue.当我们物化时,我们使用物化器来preMaterialize队列源,以便我们可以访问队列。 Our subscription adds incoming elements to the queue.我们的订阅将传入元素添加到队列中。

The API for this pubsub doesn't seem to support backpressure if the consumer can't keep up.如果消费者跟不上,这个发布订阅的 API 似乎不支持背压。 The queue will drop elements it's been handed if the buffer is full: you'll probably want to do nothing in that case, but I've called it out in the match that you should make an explicit decision here.如果缓冲区已满,队列将丢弃它所传递的元素:在这种情况下,您可能不想做任何事情,但我在比赛中指出,您应该在这里做出明确的决定。

Dropping the newest element is the synchronous behavior for this queue (there are other queue implementations available, but those will communicate dropping asynchronously which can be really bad for memory consumption in a burst).删除最新元素是此队列的同步行为(还有其他可用的队列实现,但这些实现会异步删除,这对于突发内存消耗来说可能非常糟糕)。 If you'd prefer something else, it may make sense to have a very small buffer in the queue and attach the "overall" Source (the one returned by Source.fromMaterializer ) to a stage which signals perpetual demand.如果你更喜欢别的东西,在队列中有一个非常小的缓冲区并将“整体” Source (由Source.fromMaterializer返回的Source.fromMaterializerSource.fromMaterializer到一个表示永久需求的阶段可能是Source.fromMaterializer For example, a buffer(downstreamBufferSize, OverflowStrategy.dropHead) will drop the oldest event not yet processed.例如, buffer(downstreamBufferSize, OverflowStrategy.dropHead)将丢弃尚未处理的最旧事件。 Alternatively, it may be possible to combine your Event s in some meaningful way, in which case a conflate stage will automatically combine incoming Event s if the downstream can't process them quickly.另外,有可能结合您的Event在一些有意义的方式S,在这种情况下conflate阶段将自动进入合并Event ■如果下游不能快速处理它们。

Great answer!很好的答案! I did build something similar.我确实建立了类似的东西。 There are also kamon metrics to monitor queue size exc.还有 kamon 指标来监控队列大小 exc。

class AsyncSubscriber(projectId: String, subscriptionId: String, metricsRegistry: CustomMetricsRegistry, pullParallelism: Int)(implicit val ec: Executor) {
  private val logger = LoggerFactory.getLogger(getClass)

  def bufferSize: Int = 1000

  def source(): Source[(PubsubMessage, AckReplyConsumer), Future[NotUsed]] = {
    Source.fromMaterializer { (mat, attr) =>
      val (queue, source) = Source.queue[(PubsubMessage, AckReplyConsumer)](bufferSize).preMaterialize()(mat)

      val receiver: MessageReceiver = {
        (message: PubsubMessage, consumer: AckReplyConsumer) => {
          metricsRegistry.inputEventQueueSize.update(queue.size())
          queue.offer((message, consumer)) match {
            case QueueOfferResult.Enqueued => 
              metricsRegistry.inputQueueAddEventCounter.increment()
            case QueueOfferResult.Dropped =>
              metricsRegistry.inputQueueDropEventCounter.increment()
              consumer.nack()
              logger.warn(s"Buffer is full, message nacked. Pubsub should retry don't panic. If this happens too often, we should also tweak the buffer size or the autoscaler.")
            case QueueOfferResult.Failure(ex) =>
              metricsRegistry.inputQueueDropEventCounter.increment()
              consumer.nack()
              logger.error(s"Failed to offer message with id=${message.getMessageId()}", ex)
            case QueueOfferResult.QueueClosed => 
              logger.error("Destination Queue closed. Something went terribly wrong. Shutting down the jvm.")
              consumer.nack()
              mat.shutdown()
              sys.exit(1)
          }
        }
      }

      val subscriptionName = ProjectSubscriptionName.of(projectId, subscriptionId)
      val subscriber = Subscriber.newBuilder(subscriptionName, receiver).setParallelPullCount(pullParallelism).build
      subscriber.startAsync().awaitRunning()
      source.withAttributes(attr)
    }
  }
}

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM