简体繁体 English

Apache 光束 HTTP 无界源 Python

[英]Apache Beam HTTP Unbounded Source Python

原文 2021-04-16 08:15:18 4 1 python/ http/ apache-beam/ apache-beam-io/ apache-beam-kafkaio

Is it possible with the current version of Apache Beam to develop an unbounded source that receives data in a HTTP message?是否可以使用当前版本的 Apache Beam 开发一个无界源来接收 HTTP 消息中的数据？ My intention is to run an HTTP Server and to inject the messages received into a Beam Pipeline.我的意图是运行 HTTP 服务器并将收到的消息注入 Beam 管道。 If it is possible, can it be done with the existing sources?如果可能的话，可以用现有的资源来完成吗？

1 个解决方案

It is possible.有可能的。 you can develop it by leveraging Splittable DoFn .您可以利用Splittable DoFn来开发它。Source looks like they are going to be depreciated in the near future.来源看起来他们将在不久的将来贬值。

From my end, I am trying to develop such a pipeline that would consume a Rest API that is streaming Json messages in the get's body and supports multiple connections, hence splitting the workload on API side like Adobe Livestream or Twitter . From my end, I am trying to develop such a pipeline that would consume a Rest API that is streaming Json messages in the get's body and supports multiple connections, hence splitting the workload on API side like Adobe Livestream or Twitter . This behaviour should enable scaling on the consumer end (Dataflow)此行为应在消费者端启用扩展（数据流）

My struggle is that i can't figure out a splittable restriction out of this use case.我的挣扎是我无法从这个用例中找出一个可拆分的限制。 The streaming is infinite and there is no Offset like in messaging brokers like Kafka or bytes range (files).流是无限的，并且没有像 Kafka 或字节范围（文件）这样的消息传递代理中的偏移量。 I wanted first to build element restriction pairs like: (url,buffered reader) but i don't think buffered readers can be split.我想首先构建元素限制对，例如：（url，缓冲阅读器），但我认为缓冲阅读器不能拆分。

One of the solutions might be not to provide a restriction at all.解决方案之一可能是根本不提供限制。 I am struggling to imagine how the pipeline would distribute elements hence scale.我很难想象管道将如何分配元素从而扩展。