简体   繁体   中英

Apache Beam Streaming data from Kafka to GCS Bucket (Not using pubsub)

I have seen lot of examples of Apache Beam where you read data from PubSub and write to GCS bucket, however is there any example of using KafkaIO and writing it to GCS bucket? Where I can parse the message and put it in appropriate bucket based on the message content?

For eg

message = {type="type_x", some other attributes....}
message = {type="type_y", some other attributes....}

type_x --> goes to bucket x
type_y --> goes to bucket y

My usecase is streaming data from Kafka to GCS bucket, so if someone suggest some better way to do it in GCP its welcome too.

Thanks. Regards, Anant.

You can use Secor to load messages to a GCS bucket. Secor is also able to parse incoming messages and puts them under different paths in the same bucket.

You can take a look at the example present here - https://github.com/0x0ece/beam-starter/blob/master/src/main/java/com/dataradiant/beam/examples/StreamWordCount.java

Once you have read the data elements if you want to write to multiple destinations based on a specific data value you can look at multiple outputs using TupleTagList the details of which can be found here - https://beam.apache.org/documentation/programming-guide/#additional-outputs

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM