简体   繁体   中英

Write Kafka Stream output to multiple directory using Apache Beam

I would like to persist the data from Kafka topic to google storage using Data flow.

I have written a sample code on local, it is working all good.

public static void main(String[] args) {
    PipelineOptions options = PipelineOptionsFactory.create();
    Pipeline p = Pipeline.create(options);
    p.apply(KafkaIO.<Long, String>read().withBootstrapServers("localhost:9092").withTopic("my-topic")
            .withKeyDeserializer(LongDeserializer.class).withValueDeserializer(StringDeserializer.class))
            .apply(Window
                    .<KafkaRecord<Long, String>>
                    into(FixedWindows.of(Duration.standardMinutes(1)))
            )
            .apply(FlatMapElements.into(TypeDescriptors.strings())
                    .via((KafkaRecord<Long, String> line) -> TextUtil.splitLine(line.getKV().getValue())))
            .apply(Filter.by((String word) -> StringUtils.isNotEmpty(word))).apply(Count.perElement())
            .apply(MapElements.into(TypeDescriptors.strings())
                    .via((KV<String, Long> lineCount) -> lineCount.getKey() + ": " + lineCount.getValue()))
            .apply(TextIO.write().withWindowedWrites().withNumShards(1)
                    .to("resources/temp/wc-kafka-op/wc"));

    p.run().waitUntilFinish();
}

Above code works perfectly. But I would like to save output of each window in separate directory.

eg {BasePath}/{Window}/{prefix}{Suffice}

I could not able to get it working.

TextIO supports windowedWrites, when you can specify how the name is derived. See JavaDoc .

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM