[英]Apache Beam Streaming data from Kafka to GCS Bucket (Not using pubsub)
I have seen lot of examples of Apache Beam where you read data from PubSub and write to GCS bucket, however is there any example of using KafkaIO and writing it to GCS bucket?我看过很多 Apache Beam 的例子,你从 PubSub 读取数据并写入 GCS 存储桶,但是有没有使用 KafkaIO 并将其写入 GCS 存储桶的示例? Where I can parse the message and put it in appropriate bucket based on the message content?我可以在哪里解析消息并根据消息内容将其放入适当的存储桶中?
For eg例如
message = {type="type_x", some other attributes....}
message = {type="type_y", some other attributes....}
type_x --> goes to bucket x
type_y --> goes to bucket y
My usecase is streaming data from Kafka to GCS bucket, so if someone suggest some better way to do it in GCP its welcome too.我的用例是将数据从 Kafka 流式传输到 GCS 存储桶,因此如果有人建议在 GCP 中执行此操作的更好方法,也欢迎使用。
Thanks.谢谢。 Regards, Anant.问候,阿南特。
You can take a look at the example present here - https://github.com/0x0ece/beam-starter/blob/master/src/main/java/com/dataradiant/beam/examples/StreamWordCount.java你可以看看这里的例子 - https://github.com/0x0ece/beam-starter/blob/master/src/main/java/com/dataradiant/beam/examples/StreamWordCount.java
Once you have read the data elements if you want to write to multiple destinations based on a specific data value you can look at multiple outputs using TupleTagList
the details of which can be found here - https://beam.apache.org/documentation/programming-guide/#additional-outputs读取数据元素后,如果您想根据特定数据值写入多个目的地,您可以使用TupleTagList
查看多个输出,其详细信息可在此处找到 - https://beam.apache.org/documentation/编程指南/#additional-outputs
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.