简体   繁体   English

Apache 光束 HTTP 无界源 Python

[英]Apache Beam HTTP Unbounded Source Python

Is it possible with the current version of Apache Beam to develop an unbounded source that receives data in a HTTP message?是否可以使用当前版本的 Apache Beam 开发一个无界源来接收 HTTP 消息中的数据? My intention is to run an HTTP Server and to inject the messages received into a Beam Pipeline.我的意图是运行 HTTP 服务器并将收到的消息注入 Beam 管道。 If it is possible, can it be done with the existing sources?如果可能的话,可以用现有的资源来完成吗?

It is possible.有可能的。 you can develop it by leveraging Splittable DoFn .您可以利用Splittable DoFn来开发它。Source looks like they are going to be depreciated in the near future.来源看起来他们将在不久的将来贬值。

From my end, I am trying to develop such a pipeline that would consume a Rest API that is streaming Json messages in the get's body and supports multiple connections, hence splitting the workload on API side like Adobe Livestream or Twitter . From my end, I am trying to develop such a pipeline that would consume a Rest API that is streaming Json messages in the get's body and supports multiple connections, hence splitting the workload on API side like Adobe Livestream or Twitter . This behaviour should enable scaling on the consumer end (Dataflow)此行为应在消费者端启用扩展(数据流)

My struggle is that i can't figure out a splittable restriction out of this use case.我的挣扎是我无法从这个用例中找出一个可拆分的限制。 The streaming is infinite and there is no Offset like in messaging brokers like Kafka or bytes range (files).流是无限的,并且没有像 Kafka 或字节范围(文件)这样的消息传递代理中的偏移量。 I wanted first to build element restriction pairs like: (url,buffered reader) but i don't think buffered readers can be split.我想首先构建元素限制对,例如:(url,缓冲阅读器),但我认为缓冲阅读器不能拆分。

One of the solutions might be not to provide a restriction at all.解决方案之一可能是根本不提供限制。 I am struggling to imagine how the pipeline would distribute elements hence scale.我很难想象管道将如何分配元素从而扩展。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM