简体   繁体   English

如何使用KStreams将数据从Kafka主题写入文件?

[英]How to write data from Kafka topic to file using KStreams?

I am trying to create a KStream application in Eclipse using Java. 我正在尝试使用Java在Eclipse中创建KStream应用程序。 right now I am referring to the word count program available on the internet for KStreams and modifying it. 现在,我指的是Internet上可用于KStreams的字数统计程序并对其进行修改。

What I want is that the data that I am reading from the input topic should be written to a file instead of being written to another output topic. 我想要的是应该将从输入主题读取的数据写入文件,而不是写入另一个输出主题。

But when I am trying to print the KStream/KTable to the local file, I am getting the following entry in the output file: 但是,当我尝试将KStream / KTable打印到本地文件时,我在输出文件中得到以下条目:

org.apache.kafka.streams.kstream.internals.KStreamImpl@4c203ea1

How do I implement redirecting the output from the KStream to a file? 如何实现将KStream的输出重定向到文件?

Below is the code: 下面是代码:

package KStreamDemo.kafkatest;

package org.apache.kafka.streams.examples.wordcount;

import org.apache.kafka.clients.consumer.ConsumerConfig;
import org.apache.kafka.common.serialization.Serdes;
import org.apache.kafka.streams.KafkaStreams;
import org.apache.kafka.streams.StreamsBuilder;
import org.apache.kafka.streams.StreamsConfig;
import org.apache.kafka.streams.kstream.KStream;
import org.apache.kafka.streams.kstream.KTable;
import org.apache.kafka.streams.kstream.KeyValueMapper;
import org.apache.kafka.streams.kstream.Produced;
import org.apache.kafka.streams.kstream.ValueMapper;

import java.util.Arrays;
import java.util.Locale;
import java.util.Properties;
import java.util.concurrent.CountDownLatch;
public class TemperatureDemo {
public static void main(String[] args) throws Exception {
    Properties props = new Properties();
    props.put(StreamsConfig.APPLICATION_ID_CONFIG, "streams-wordcount");
    props.put(StreamsConfig.BOOTSTRAP_SERVERS_CONFIG, "34.73.184.104:9092");
    props.put(StreamsConfig.CACHE_MAX_BYTES_BUFFERING_CONFIG, 0);
    props.put(StreamsConfig.DEFAULT_KEY_SERDE_CLASS_CONFIG, Serdes.String().getClass().getName());
    props.put(StreamsConfig.DEFAULT_VALUE_SERDE_CLASS_CONFIG, Serdes.String().getClass().getName());
    System.out.println("#1###################################################################################################################################################################################");
    // setting offset reset to earliest so that we can re-run the demo code with the same pre-loaded data
    // Note: To re-run the demo, you need to use the offset reset tool:
    // https://cwiki.apache.org/confluence/display/KAFKA/Kafka+Streams+Application+Reset+Tool
    props.put(ConsumerConfig.AUTO_OFFSET_RESET_CONFIG, "earliest");

    StreamsBuilder builder = new StreamsBuilder();
    System.out.println("#2###################################################################################################################################################################################");
    KStream<String, String> source = builder.stream("iot-temperature");
    System.out.println("#5###################################################################################################################################################################################");
    KTable<String, Long> counts = source
        .flatMapValues(new ValueMapper<String, Iterable<String>>() {
            @Override
            public Iterable<String> apply(String value) {
                return Arrays.asList(value.toLowerCase(Locale.getDefault()).split(" "));
            }
        })
        .groupBy(new KeyValueMapper<String, String, String>() {
            @Override
            public String apply(String key, String value) {
                return value;
            }
        })
        .count();
    System.out.println("#3###################################################################################################################################################################################");
    System.out.println("OUTPUT:"+ counts);
    System.out.println("#4###################################################################################################################################################################################");
    // need to override value serde to Long type
    counts.toStream().to("iot-temperature-max", Produced.with(Serdes.String(), Serdes.Long()));

    final KafkaStreams streams = new KafkaStreams(builder.build(), props);
    final CountDownLatch latch = new CountDownLatch(1);

    // attach shutdown handler to catch control-c
    Runtime.getRuntime().addShutdownHook(new Thread("streams-wordcount-shutdown-hook") {
        @Override
        public void run() {
            streams.close();
            latch.countDown();
        }
    });

    try {
        streams.start();
        latch.await();
    } catch (Throwable e) {
        System.exit(1);
    }
    System.exit(0);
}

} }

This is not correct 这不正确

System.out.println("OUTPUT:"+ counts);

You would need to do counts.foreach , then print the messages out to a file. 您需要执行counts.foreach ,然后将消息打印到文件中。

Print Kafka Stream Input out to console? 打印Kafka Stream输入到控制台? (just update to write to file instead) (只需更新以写入文件即可)


However , probably better to write out the stream to a topic. 但是 ,将流写出到主题可能更好。 And the use Kafka Connect to write out to a file. 然后使用Kafka Connect将其写出到文件中。 This is a more industry-standard pattern. 这是一种更符合行业标准的模式。 Kafka Streams is encouraged to only move data between topics within Kafka, not integrate with external systems (or filesystems) 鼓励Kafka Streams仅在Kafka中的主题之间移动数据,而不与外部系统(或文件系统)集成

Edit connect-file-sink.properties with the topic information you want, then 使用所需的主题信息编辑connect-file-sink.properties ,然后

bin/connect-standalone config/connect-file-sink.properties

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何将Avro格式的数据生成到kstreams中的主题 - How to produce avro format data onto topic in kstreams 如何使用 spring-kafka 和 kafka-streams 在 KStreams Bean 中记录偏移量 - How to log offset in KStreams Bean using spring-kafka and kafka-streams 在 kstreams 应用程序中使用自定义 Kafka State 存储 - Using Custom Kafka State Stores in kstreams application 如何将选定的列写入 Kafka 主题? - How to write selected columns to Kafka topic? 如何使用flink在2个kafka主题之间进行数据比较 - How to perform data comparison between 2 kafka topic using flink 如何在特定偏移量到特定偏移量中使用来自 kafka 主题的数据? - How to consume data from kafka topic in specific offset to specific offset? Kafka KStreams-如何添加线程/使用StreamsConfig.NUM_STREAM_THREADS_CONFIG - Kafka KStreams - how to add threads / using StreamsConfig.NUM_STREAM_THREADS_CONFIG Kafka:如何使用 Java API 从主题中删除记录? - Kafka: How to delete records from a topic using Java API? 我们如何使用 API 从 IDE 在 Kafka 中创建主题 - How Can we create a topic in Kafka from the IDE using API 如何使用 spring webflux 从 Kafka 主题中持续消费? - How to continually consume from Kafka topic using spring webflux?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM