Kafka-Streaming：如何收集消息对并写入新主题

Question

This is a beginner's question to kafka-streaming. 这是kafka-streaming的初学者问题。

How would you collect pairs of messages using the java kafka-streaming library and write them to a new output topic? 您将如何使用java kafka-streaming库收集消息对并将它们写入新的输出主题？

I was thinking about something like this: 我在想这样的事情：

private void accumulateTwo(KStream<String, String> messages) {
    Optional<String> accumulator = Optional.empty();
    messages.mapValues(value -> {
        if (accumulator.isPresent()) {
            String tmp = accumulator.get();
            accumulator = Optional.empty();
            return Optional.of(new Tuple<>(tmp, value));
        }
        else {
            accumulator = Optional.of(value);
            return Optional.empty();
        }
    }).filter((key, value) -> value.isPresent()).to("pairs");

Yet this will not work, since variables in Java Lambda expressions must be final. 但是这将不起作用，因为Java Lambda表达式中的变量必须是最终的。

Any ideas? 有任何想法吗？

Answer 1

EDIT: 编辑：

As suggested in the comments, three additional steps are necessary: 如评论中所建议，还需要三个附加步骤：

The Transformer must explicitly store its state within a state store. Transformer必须在状态存储区中明确存储其状态。 It will get a reference to the state store from the ProcessorContext , which it is getting passed in the init method. 它将从ProcessorContext获取对状态存储的引用，并将其传递给init方法。
The state store must be registered with the StreamsBuilder 状态存储区必须在StreamsBuilder注册
The name of the state store must be passed within the transform method. 状态存储的名称必须在transform方法中传递。

In this example it is sufficient to store the last message we have seen. 在此示例中，存储我们所看到的最后一条消息就足够了。 We are using a KeyValueStore for this which will have exactly zero or one entry at each point in time. 我们为此使用了一个KeyValueStore ，它在每个时间点都将有零个或一个正好条目。

public class PairTransformerSupplier<K,V> implements TransformerSupplier<K, V, KeyValue<K, Pair<V,V>>> {

    private String storeName;

    public PairTransformerSupplier(String storeName) {
        this.storeName = storeName;
    }

    @Override
    public Transformer<K, V, KeyValue<K, Pair<V, V>>> get() {
        return new PairTransformer<>(storeName);
    }
}


public class PairTransformer<K,V> implements Transformer<K, V, KeyValue<K, Pair<V, V>>> {
    private ProcessorContext context;
    private String storeName;
    private KeyValueStore<Integer, V> stateStore;

    public PairTransformer(String storeName) {
        this.storeName = storeName;
    }

    @Override
    public void init(ProcessorContext context) {
        this.context = context;
        stateStore = (KeyValueStore<Integer, V>) context.getStateStore(storeName);
    }

    @Override
    public KeyValue<K, Pair<V, V>> transform(K key, V value) {
        // 1. Update the store to remember the last message seen. 
        if (stateStore.get(1) == null) {
            stateStore.put(1, value); return null;
        }
        KeyValue<K, Pair<V,V>> result = KeyValue.pair(key, new Pair<>(stateStore.get(1), value));
        stateStore.put(1, null);
        return result;
    }

    @Override
    public void close() { }

}


public KStream<String, String> sampleStream(StreamsBuilder builder) {
    KStream<String, String> messages = builder.stream(inputTopic, Consumed.with(Serdes.String(), Serdes.String()));
    // 2. Create the state store and register it with the streams builder. 
    KeyValueBytesStoreSupplier store = Stores.persistentKeyValueStore(stateStoreName);
    StoreBuilder storeBuilder = new KeyValueStoreBuilder<>(
            store,
            new Serdes.IntegerSerde(),
            new Serdes.StringSerde(),
            Time.SYSTEM
    );
    builder.addStateStore(storeBuilder);
    transformToPairs(messages);
    return messages;
}

private void transformToPairs(KStream<String, String> messages) {
    // 3. reference the name of the state store when calling transform(...)
    KStream<String, Pair<String, String>> pairs = messages.transform(
            new PairTransformerSupplier<>(),
            stateStoreName
    );
    KStream<String, Pair<String, String>> filtered = pairs.filter((key, value) -> value != null);
    KStream<String, String> serialized = filtered.mapValues(Pair::toString);
    serialized.to(outputTopic);
}

Changes to the state store can be watched using the console consumer: 可以使用控制台使用者查看状态存储的更改：

./bin/kafka-console-consumer --topic <changelog-topic-name> --bootstrap-server localhost:9092

Full source code here: https://github.com/1123/spring-kafka-stream-with-state-store 完整的源代码在这里： https : //github.com/1123/spring-kafka-stream-with-state-store

Original Answer: 原始答案：

The JavaDoc of the org.apache.kafka.streams.kstream.ValueMapper interface states that it is for stateless record-by-record transformations, and that the org.apache.kafka.streams.kstream.Transformer interface, on the other hand, is org.apache.kafka.streams.kstream.ValueMapper接口的JavaDoc指出，它用于无状态逐记录转换，而org.apache.kafka.streams.kstream.Transformer接口则是

for stateful mapping of an input record to zero, one, or multiple new output records. 用于将输入记录有状态地映射到零个，一个或多个新的输出记录。

Therefore I guess the Transformer interface is the appropriate choice for collecting pairs of messages. 因此，我认为Transformer接口是收集消息对的适当选择。 This may only be of relevance in case of failure and restart of streaming applications, such that they can recover the state from Kafka. 这仅在流应用程序失败和重新启动的情况下才有意义，以便它们可以从Kafka恢复状态。

Hence, here is another solution based upon the org.apache.kafka.streams.kstream.Transformer interface: 因此，这是基于org.apache.kafka.streams.kstream.Transformer接口的另一种解决方案：

class PairTransformerSupplier<K,V> implements TransformerSupplier<K, V, KeyValue<K, Pair<V,V>>> {

    @Override
    public Transformer<K, V, KeyValue<K, Pair<V, V>>> get() {
        return new PairTransformer<>();
    }
}

public class PairTransformer<K,V> implements Transformer<K, V, KeyValue<K, Pair<V, V>>> {
    private V left;

    @Override
    public void init(ProcessorContext context) {
        left = null;
    }

    @Override
    public KeyValue<K, Pair<V, V>> transform(K key, V value) {
        if (left == null) { left = value; return null; }
        KeyValue<K, Pair<V,V>> result = KeyValue.pair(key, new Pair<>(left, value));
        left = null;
        return result;
    }

    @Override
    public KeyValue<K, Pair<V, V>> punctuate(long timestamp) {
        return null;
    }

    public void close() { }

}

The PairTransformerSupplier is then used as follows: 然后按以下方式使用PairTransformerSupplier：

private void accumulateTwo(KStream<String, String> messages) {
    messages.transform(new PairTransformerSupplier<>())
            .filter((key, value) -> value != null)
            .mapValues(Pair::toString)
            .to("pairs");
}

Trying out both solutions within a single process on a topic with a single partition yields, however, the exact same results. 在具有单个分区的主题上，在单个过程中尝试两种解决方案都会产生完全相同的结果。 I have not tried with a topic with multiple partitions and multiple stream consumers. 我没有尝试使用具有多个分区和多个流使用者的主题。

Answer 2

You should be able to write an accumulator class 您应该能够编写一个累加器类

class Accumulator implements ValueMapper<String, Optional<Tuple<String>>> {
    private String key;

    public Optional<Tuple<String>> get(String item) {
        if (key == null) {
            key = item;
            return Optional.empty();
        }
        Optional<Tuple<String>> result = Optional.of(new Tuple<>(key, item));
        key = null;
        return result;
    }
}

and then process with 然后处理

messages.mapValues(new Accumulator())
        .filter(Optional::isPresent) // I don't think your filter is correct
        .to("pairs");

Kafka-Streaming：如何收集消息对并写入新主题

问题描述

2 个解决方案

解决方案1
2 已采纳 2018-09-28 13:15:13

EDIT: 编辑：

Original Answer: 原始答案：

解决方案2
1 2018-09-26 10:03:06

Kafka-Streaming：如何收集消息对并写入新主题

问题描述

2 个解决方案

解决方案1 2 已采纳 2018-09-28 13:15:13

EDIT: 编辑：

Original Answer: 原始答案：

解决方案2 1 2018-09-26 10:03:06

解决方案1
2 已采纳 2018-09-28 13:15:13

解决方案2
1 2018-09-26 10:03:06