[英]How to enable caching on in-memory Kafka Streams state store
I want to decrease the number of data being sent downstream, and since I only care about the last value of a given key, I'm reading data from a topic this way:我想减少向下游发送的数据数量,并且由于我只关心给定键的最后一个值,因此我以这种方式从主题中读取数据:
KTable table = build.table("inputTopic", Materialized.as("myStore"));
Why?为什么? Because under the hood, the data is being cached, as described here , and forwarded only if the commit.interval.ms or cache.max.bytes.buffering kicks in.
因为罩下,将数据被缓存,如所描述这里,且仅当commit.interval.ms或cache.max.bytes.buffering踢转发。
So far so good, but since in this case I'm not taking advantage of RocksDB at all, so I'd like to replace it with the default implementation of an in-memory store.到目前为止一切顺利,但由于在这种情况下我根本没有利用 RocksDB,所以我想用内存存储的默认实现替换它。 I implicitly enable caching, just in case.
我隐式启用缓存,以防万一。
Materialized.as(Stores.inMemoryKeyValueStore("myStore")).withCachingEnabled();
It doesn't work, though - the data is not being cached and every record is being sent downstream.但是它不起作用 - 数据没有被缓存并且每条记录都被发送到下游。
Is there another way to enable caching?还有另一种方法可以启用缓存吗? Or perhaps there a better way to do what I'm trying to achieve?
或者也许有更好的方法来完成我想要实现的目标?
It seems I was wrong and in-memory state store caching works as expected.看来我错了,内存状态存储缓存按预期工作。 I'll briefly show how I've tested it, perhaps someone will find it useful.
我将简要展示我如何测试它,也许有人会发现它很有用。 I made a very basic Kafka Streams application that just reads from a topic abstracted as a KTable.
我制作了一个非常基本的 Kafka Streams 应用程序,它只是从一个抽象为 KTable 的主题中读取数据。
public class Main {
public static void main(String[] args) {
StreamsBuilder builder = new StreamsBuilder();
Logger logger = LoggerFactory.getLogger(Main.class);
builder.table("inputTopic", Materialized.as(Stores.inMemoryKeyValueStore("myStore")).withCachingEnabled())
.toStream()
.foreach((k, v) -> logger.info("Result: {} - {}", k, v));
new KafkaStreams(builder.build(), getProperties()).start();
}
private static Properties getProperties() {
Properties properties = new Properties();
properties.put(APPLICATION_ID_CONFIG, "testApp");
properties.put(BOOTSTRAP_SERVERS_CONFIG, "localhost:9092");
properties.put(COMMIT_INTERVAL_MS_CONFIG, 10000);
properties.put(CACHE_MAX_BYTES_BUFFERING_CONFIG, 10 * 1024 * 1024L);
properties.put(DEFAULT_KEY_SERDE_CLASS_CONFIG, Serdes.String().getClass().getName());
properties.put(DEFAULT_VALUE_SERDE_CLASS_CONFIG, Serdes.String().getClass().getName());
return properties;
}
}
Then I ran the console producer from Kafka:然后我从 Kafka 运行控制台生产者:
/kafka-console-producer.sh --broker-list localhost:9092 --topic inputTopic --property "parse.key=true" --property "key.separator=:"
And sent few messages: a:a, a:b, a:c.并发送了几条消息:a:a、a:b、a:c。 Only the last message of them was visible in the app, so the cache works as expected.
在应用程序中只能看到它们的最后一条消息,因此缓存按预期工作。
2018-03-06 21:21:57 INFO Main:26 - Result: a - c
2018-03-06 21:21:57 INFO Main:26 - 结果:a - c
I've also changed the stream slightly to check the caching of aggregate
method.我还稍微更改了流以检查
aggregate
方法的缓存。
builder.stream("inputTopic")
.groupByKey()
.aggregate(() -> "", (k, v, a) -> a + v, Materialized.as(Stores.inMemoryKeyValueStore("aggregate")))
.toStream()
.foreach((k, v) -> logger.info("Result: {} - {}", k, v));
I've sent few messages in rapid succession with the same key, and I've received just a single result, so the data was not being sent downstream right away - exactly as intended.我使用相同的密钥快速连续发送了几条消息,并且只收到了一个结果,因此数据没有立即发送到下游 - 完全按照预期发送。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.