简体   繁体   English

Apache Beam 将数据从 Kafka 流式传输到 GCS Bucket(不使用 pubsub)

[英]Apache Beam Streaming data from Kafka to GCS Bucket (Not using pubsub)

I have seen lot of examples of Apache Beam where you read data from PubSub and write to GCS bucket, however is there any example of using KafkaIO and writing it to GCS bucket?我看过很多 Apache Beam 的例子,你从 PubSub 读取数据并写入 GCS 存储桶,但是有没有使用 KafkaIO 并将其写入 GCS 存储桶的示例? Where I can parse the message and put it in appropriate bucket based on the message content?我可以在哪里解析消息并根据消息内容将其放入适当的存储桶中?

For eg例如

message = {type="type_x", some other attributes....}
message = {type="type_y", some other attributes....}

type_x --> goes to bucket x
type_y --> goes to bucket y

My usecase is streaming data from Kafka to GCS bucket, so if someone suggest some better way to do it in GCP its welcome too.我的用例是将数据从 Kafka 流式传输到 GCS 存储桶,因此如果有人建议在 GCP 中执行此操作的更好方法,也欢迎使用。

Thanks.谢谢。 Regards, Anant.问候,阿南特。

You can use Secor to load messages to a GCS bucket.您可以使用Secor将消息加载到 GCS 存储桶。 Secor is also able to parse incoming messages and puts them under different paths in the same bucket. Secor 还能够解析传入的消息并将它们放在同一存储桶中的不同路径下。

You can take a look at the example present here - https://github.com/0x0ece/beam-starter/blob/master/src/main/java/com/dataradiant/beam/examples/StreamWordCount.java你可以看看这里的例子 - https://github.com/0x0ece/beam-starter/blob/master/src/main/java/com/dataradiant/beam/examples/StreamWordCount.java

Once you have read the data elements if you want to write to multiple destinations based on a specific data value you can look at multiple outputs using TupleTagList the details of which can be found here - https://beam.apache.org/documentation/programming-guide/#additional-outputs读取数据元素后,如果您想根据特定数据值写入多个目的地,您可以使用TupleTagList查看多个输出,其详细信息可在此处找到 - https://beam.apache.org/documentation/编程指南/#additional-outputs

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 尝试在 Dataflow 中使用 Apache Beam 将数据从 Google PubSub 写入 GCS 时出错 - Getting an Error while trying to write data from Google PubSub to GCS using Apache Beam in Dataflow 如何使用 Apache Beam 将数据从 teradata 迁移到 GCS - How to migrate data from teradata to GCS using Apache Beam 使用 Apache Beam 和 Dataflow 从按日期分区的动态 GCS 存储桶中读取 - Read from dynamic GCS bucket partitioned by date using Apache Beam and Dataflow 每个元素使用apap Beam流写入gcs - streaming write to gcs using apache beam per element apache束流管道以观看gcs文件正则表达式 - apache beam streaming pipeline to watch gcs file regex Apache Beam 使用 Go 写入 PubSub 消息 - Apache Beam Write PubSub messages using Go 使用 apache beam 从 GCS 读取文件时面临性能问题 - Facing Performance issue while reading files from GCS using apache beam Google Cloud - 如何从远程PubSub获取数据作为Apache Beam Pipeline在本地执行的输入? - Google Cloud - How to get data from remote PubSub as input for Apache Beam Pipeline executing locally? 有没有办法使用Apache Beam读取Google pubsub消息的消息ID - Is there way to read the message id of a google pubsub message using apache beam 使用 Apache Beam(GCP 数据流)写入 Kafka - Write To Kafka using Apache Beam (GCP Dataflow)
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM