简体   繁体   English

Kafka Streams在分组和聚合时使用KTable转换为字符串问题

[英]Kafka Streams cast to string issues with KTable when grouping and aggregating

I have a Kafka stream with incoming messages that looks like sensor_code: x, time: 1526978768, address: Y I want to create a KTable that stores each unique address at each sensor code. 我有一个Kafka流,其传入的消息看起来像sensor_code: x, time: 1526978768, address: Y我想创建一个KTable,它存储每个传感器代码的每个唯一地址。

KTable KTable

KTable<String, Long> numCount = streams
            .map(kvm1)
            .groupByKey(Serialized.with(stringSerde, stringSerde))
            .count()
            .groupBy(kvm2, Serialized.with(stringSerde, longSerde))
            .count(Materialized.<String, Long, KeyValueStore<Bytes, byte[]>>as("StateStore"));

Where kvm1 and kvm2 are my own KeyValueMappers . 其中kvm1kvm2是我自己的KeyValueMappers My idea was to replace the existing key with sensor_code=x, address=y , perform a groupByKey() and count() . 我的想法是用sensor_code=x, address=y替换现有密钥,执行groupByKey()count() Then another groupBy(kvm2, Serialized.with(stringSerde, longSerde)) where kvm2 modifies the existing key to contain the sensor_code and then the value would be its count. 然后另一个groupBy(kvm2, Serialized.with(stringSerde, longSerde))其中kvm2修改现有key以包含sensor_code ,然后该值将是其计数。 But since it is not working, maybe I am doing it wrong... It tries to cast it as a Long and throws an exception, because it is looking for a String. 但由于它不起作用,也许我做错了...它试图将它转换为Long并抛出异常,因为它正在寻找一个String。 I want the count as Long , right? 我想把伯爵当作Long ,对吗?

Here is the first KeyValueMapper I use with its respective help function: 这是我使用的第一个KeyValueMapper及其各自的帮助功能:

    private static String getKeySensorIdAddress(String o) {
    String x = "sensor_id=\"x\", address=\"y\""; 
    try {
        WifiStringEvent event = mapper.readValue(o, WifiStringEvent.class);
        x = x.replace("x", event.getSensor_code());
        x = x.replace("y", event.getAddress());
        return x;
    } catch(Exception ex) {
        System.out.println("Error... " + ex);
        return "Error";
    }
}
        //KeyValueMapper1
KeyValueMapper<String, String, KeyValue<String, String>> kvm1 = 
    new KeyValueMapper<String, String, KeyValue<String, String>>() {
         public KeyValue<String, String> apply(String key, String value) {
             return new KeyValue<>(getKeySensorIdAddress(value), value);
         }
    };

Here is the second KeyValueMapper and its help function. 这是第二个KeyValueMapper及其帮助功能。

    private static String getKeySensorId(String o) {
    int a = o.indexOf(",");
    return o.substring(0,a);
}

        //KeyValueMapper2 
    KeyValueMapper<String, Long, KeyValue<String, Long>> kvm2 = 
    new KeyValueMapper<String, Long, KeyValue<String, Long>>() {
         public KeyValue<String, Long> apply(String key, Long value) {
             return new KeyValue<>(getKeySensorId(key), value);
         }
    };

Here is the exception and error that are returned when I try to run the code. 这是我尝试运行代码时返回的异常和错误。

[2018-05-29 15:28:40,119] ERROR stream-thread [testUniqueAddresses-ed48daf8-fff0-42e4-bb5a-687584734b45-StreamThread-1] Failed to process stream task 2_0 due to the following error: (org.apache.kafka.streams.processor.internals.AssignedStreamsTasks:105) java.lang.ClassCastException: java.lang.Long cannot be cast to java.lang.String at org.apache.kafka.common.serialization.StringSerializer.serialize(StringSerializer.java:28) at org.apache.kafka.streams.state.StateSerdes.rawValue(StateSerdes.java:178) at org.apache.kafka.streams.state.internals.MeteredKeyValueBytesStore$1.innerValue(MeteredKeyValueBytesStore.java:66) at org.apache.kafka.streams.state.internals.MeteredKeyValueBytesStore$1.innerValue(MeteredKeyValueBytesStore.java:57) at org.apache.kafka.streams.state.internals.InnerMeteredKeyValueStore.put(InnerMeteredKeyValueStore.java:198) at org.apache.kafka.streams.state.internals.MeteredKeyValueBytesStore.put(MeteredKeyValueBytesStore.java:117) at org.apache.kafka.streams.kstream.int [2018-05-29 15:28:40,119] ERROR stream-thread [testUniqueAddresses-ed48daf8-fff0-42e4-bb5a-687584734b45-StreamThread-1]由于以下错误,无法处理流任务2_0 :( org.apache。 kafka.streams.processor.internals.AssignedStreamsTasks:105)java.lang.ClassCastException:java.lang.Long无法在org.apache.kafka.common.serialization.StringSerializer.serialize(StringSerializer.java)中强制转换为java.lang.String :28)atg.apache.kafka.streams.state.StateSerdes.rawValue(StateSerdes.java:178)atg.apache.kafka.streams.state.internals.MeteredKeyValueBytesStore $ 1.innerValue(MeteredKeyValueBytesStore.java:66)at org位于org.apache的org.apache.kafka.streams.state.internals.InnerMeteredKeyValueStore.put(InnerMeteredKeyValueStore.java:198)的.apache.kafka.streams.state.internals.MeteredKeyValueBytesStore $ 1.innerValue(MeteredKeyValueBytesStore.java:57)。 org.apache.kafka.streams.kstream.int上的kafka.streams.state.internals.MeteredKeyValueBytesStore.put(MeteredKeyValueBytesStore.java:117) ernals.KTableAggregate$KTableAggregateProcessor.process(KTableAggregate.java:95) at org.apache.kafka.streams.kstream.internals.KTableAggregate$KTableAggregateProcessor.process(KTableAggregate.java:56) ernals.KTableAggregate $ KTableAggregateProcessor.process(KTableAggregate.java:95)at org.apache.kafka.streams.kstream.internals.KTableAggregate $ KTableAggregateProcessor.process(KTableAggregate.java:56)

Note the java.lang.ClassCastException: java.lang.Long cannot be cast to java.lang.String error. 注意java.lang.ClassCastException: java.lang.Long cannot be cast to java.lang.String错误。

Any ideas why I get this error and how I can fix it or advice how I can edit the code to reach my desired output as I have mentioned? 任何想法,为什么我得到这个错误,我如何解决它或建议我如何编辑代码,以达到我所提到的所需输出?

Many thanks in advance! 提前谢谢了!

EDIT: Made major overhaul of my question since I have abandoned one of the approaches. 编辑:对我的问题进行了重大改革,因为我放弃了其中一种方法。

In the first case, if you want to use a HashMap as the value type, you need to define a custom serde for it and pass it using Materialized. 在第一种情况下,如果要使用HashMap作为值类型,则需要为其定义自定义serde并使用Materialized传递它。 withValueSerde . withValueSerde

In the second case I can't say without seeing the return type from your KeyValueMappers and the exact error message: is it trying to cast String to a Long or vice-versa? 在第二种情况下,我不能说没有看到KeyValueMappers中的返回类型和确切的错误消息:它是否尝试将String转换为Long,反之亦然?

EDIT: Thanks for sharing extra info. 编辑:感谢您分享额外信息。

I think what you need in the second case is to also specify the value serde in the second count operation. 我认为在第二种情况下你需要的是在第二次计数操作中指定值serde。 There seems to have been an inconsistency between count() on a KGroupedStream and a KGroupedTable in that the former automatically sets the value serde to LongSerde: KGroupedStream上的count()与KGroupedTable之间似乎存在不一致,前者自动将值serde设置为LongSerde:

https://github.com/apache/kafka/blob/1.1/streams/src/main/java/org/apache/kafka/streams/kstream/internals/KGroupedStreamImpl.java#L281-L283 https://github.com/apache/kafka/blob/1.1/streams/src/main/java/org/apache/kafka/streams/kstream/internals/KGroupedStreamImpl.java#L281-L283

but the KGroupedTable doesn't: 但KGroupedTable没有:

https://github.com/apache/kafka/blob/1.1/streams/src/main/java/org/apache/kafka/streams/kstream/internals/KGroupedTableImpl.java#L253 https://github.com/apache/kafka/blob/1.1/streams/src/main/java/org/apache/kafka/streams/kstream/internals/KGroupedTableImpl.java#L253

It seems to have been fixed on trunk already but not released yet: 它似乎已经固定在主干上但尚未发布:

https://github.com/apache/kafka/blob/trunk/streams/src/main/java/org/apache/kafka/streams/kstream/internals/KGroupedTableImpl.java#L158-L160 https://github.com/apache/kafka/blob/trunk/streams/src/main/java/org/apache/kafka/streams/kstream/internals/KGroupedTableImpl.java#L158-L160

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM