簡體   English   中英

KTable-KTable FK join: can't select外鍵serde

[英]KTable-KTable FK join: can't select foreign key serde

概括

我正在嘗試進行 KTable-KTable 外鍵連接,但我收到錯誤消息,因為 Kafka Streams 正在嘗試使用 String serde 作為外鍵。

我希望它使用 Kotlinx 序列化 serde。 我該如何指定?

細節

我想將兩個 KTable 的數據連接在一起,使用 FK 選擇器並將值重新映射到聚合 object 中。

tilesGroupedByChunk
  .join<ChunkTilesAndProtos, SurfaceIndex, SurfacePrototypesData>(
    tilePrototypesTable, // join the prototypes KTable
    { cd: MapChunkData -> cd.chunkPosition.surfaceIndex }, // FK join on SurfaceIndex
    { chunkTiles: MapChunkData, protos: SurfacePrototypesData ->
      ChunkTilesAndProtos(chunkTiles, protos) // remap value 
    },
    namedAs("joining-chunks-tiles-prototypes"),
    materializedAs(
      "joined-chunked-tiles-with-prototypes",
      // `.serde()`- helper function to make a Serde from a Kotlinx Serialization JSON module 
      // see https://github.com/adamko-dev/kotka-streams/blob/38388e74b16f3626a2733df1faea2037b89dee7c/modules/kotka-streams-kotlinx-serialization/src/main/kotlin/dev/adamko/kotka/kxs/jsonSerdes.kt#L48
      jsonMapper.serde(),
      jsonMapper.serde(),
    ),
  )

但是,我收到一個錯誤,因為 Kafka Streams 正在使用Serdes.String() (我的默認 Serde)來反序列化外鍵。 但它是 JSON object,我希望它使用 Kotlinx 序列化。

org.apache.kafka.streams.errors.StreamsException: ClassCastException invoking Processor. 
Do the Processor's input types match the deserialized types? 
Check the Serde setup and change the default Serdes in 
StreamConfig or provide correct Serdes via method 
parameters. Make sure the Processor can accept the 
deserialized input of type key: myproject.MyTopology$MapChunkDataPosition, and value: org.apache.kafka.streams.kstream.internals.Change.

Note that although incorrect Serdes are a common cause 
of error, the cast exception might have another cause 
(in user code, for example). For example, if a 
processor wires in a store, but casts the generics 
incorrectly, a class cast exception could be raised 
during processing, but the cause would not be wrong Serdes.

背景

我正在使用的數據來自電腦游戲。 游戲中有一個map,叫做表面。 每個表面都由表面索引唯一標識。 每個表面都有瓷磚,在 x/y 平面上。 瓦片有一個“原型名稱”,它是TilePrototype的 ID。 每個TilePrototype都有關於該圖塊的功能或外觀的信息。 我需要它的顏色。

拓撲

按塊分組圖塊

首先,我將圖塊分組為 32x32 的塊,然后將它們分組為 KTable。

/** Each chunk is identified by the surface, and an x/y coordinate */
@Serializable
data class MapChunkDataPosition(
  val position: MapChunkPosition,
  val surfaceIndex: SurfaceIndex,
)

/** Each chunk has 32 tiles */
@Serializable
data class MapChunkData(
  val chunkPosition: MapChunkDataPosition,
  val tiles: Set<MapTile>,
)

// get all incoming tiles and group them by chunk,
// this works successfully
val tilesGroupedByChunk: KTable<MapChunkDataPosition, MapChunkData> =
  buildChunkedTilesTable(tilesTable)
按表面索引對原型進行分組

然后我通過表面索引收集所有原型,並將它們聚合成一個列表


/** Identifier for a surface (a simple wrapper, so I can use a Kotlinx Serialization serde everywhere)*/
@Serializable
data class SurfaceIndex(
  val surfaceIndex: Int
)

/** Each surface has some 'prototypes' - I want this because each tile has a colour */
@Serializable
data class SurfacePrototypesData(
  val surfaceIndex: SurfaceIndex,
  val mapTilePrototypes: Set<MapTilePrototype>,
)

// get all incoming prototypes and group them by surface index,
// this works successfully
val tilePrototypesTable: KTable<SurfaceIndex, SurfacePrototypesData> =
  tilePrototypesTable()
ktable-ktable fk 連接

這是導致錯誤的代碼

/** For each chunk, get all tiles in that chunk, and all prototypes */
@Serializable
data class ChunkTilesAndProtos(
  val chunkTiles: MapChunkData,
  val protos: SurfacePrototypesData
)

tilesGroupedByChunk
  .join<ChunkTilesAndProtos, SurfaceIndex, SurfacePrototypesData>(
    tilePrototypesTable, // join the prototypes
    { cd: MapChunkData -> cd.chunkPosition.surfaceIndex }, // FK join on SurfaceIndex
    { chunkTiles: MapChunkData, protos: SurfacePrototypesData ->
      ChunkTilesAndProtos(chunkTiles, protos) // remap value 
    },
    namedAs("joining-chunks-tiles-prototypes"),
    materializedAs(
      "joined-chunked-tiles-with-prototypes",
      // `.serde()`- helper function to make a Serde from a Kotlinx Serialization JSON module 
      // see https://github.com/adamko-dev/kotka-streams/blob/38388e74b16f3626a2733df1faea2037b89dee7c/modules/kotka-streams-kotlinx-serialization/src/main/kotlin/dev/adamko/kotka/kxs/jsonSerdes.kt#L48
      jsonMapper.serde(),
      jsonMapper.serde(),
    ),
  )

完整的堆棧跟蹤

org.apache.kafka.streams.errors.StreamsException: ClassCastException invoking Processor. Do the Processor's input types match the deserialized types? Check the Serde setup and change the default Serdes in StreamConfig or provide correct Serdes via method parameters. Make sure the Processor can accept the deserialized input of type key: MyProject.processor.Topology$MapChunkDataPosition, and value: org.apache.kafka.streams.kstream.internals.Change.
Note that although incorrect Serdes are a common cause of error, the cast exception might have another cause (in user code, for example). For example, if a processor wires in a store, but casts the generics incorrectly, a class cast exception could be raised during processing, but the cause would not be wrong Serdes.
at org.apache.kafka.streams.processor.internals.ProcessorNode.process(ProcessorNode.java:150)
at org.apache.kafka.streams.processor.internals.ProcessorContextImpl.forwardInternal(ProcessorContextImpl.java:253)
at org.apache.kafka.streams.processor.internals.ProcessorContextImpl.forward(ProcessorContextImpl.java:232)
at org.apache.kafka.streams.processor.internals.ProcessorContextImpl.forward(ProcessorContextImpl.java:191)
at org.apache.kafka.streams.processor.internals.ProcessorContextImpl.forward(ProcessorContextImpl.java:172)
at org.apache.kafka.streams.kstream.internals.KTableMapValues$KTableMapValuesProcessor.process(KTableMapValues.java:131)
at org.apache.kafka.streams.kstream.internals.KTableMapValues$KTableMapValuesProcessor.process(KTableMapValues.java:105)
at org.apache.kafka.streams.processor.internals.ProcessorAdapter.process(ProcessorAdapter.java:71)
at org.apache.kafka.streams.processor.internals.ProcessorNode.process(ProcessorNode.java:146)
at org.apache.kafka.streams.processor.internals.ProcessorContextImpl.forwardInternal(ProcessorContextImpl.java:253)
at org.apache.kafka.streams.processor.internals.ProcessorContextImpl.forward(ProcessorContextImpl.java:232)
at org.apache.kafka.streams.processor.internals.ProcessorContextImpl.forward(ProcessorContextImpl.java:186)
at org.apache.kafka.streams.kstream.internals.TimestampedCacheFlushListener.apply(TimestampedCacheFlushListener.java:54)
at org.apache.kafka.streams.kstream.internals.TimestampedCacheFlushListener.apply(TimestampedCacheFlushListener.java:29)
at org.apache.kafka.streams.state.internals.MeteredKeyValueStore$1.apply(MeteredKeyValueStore.java:182)
at org.apache.kafka.streams.state.internals.MeteredKeyValueStore$1.apply(MeteredKeyValueStore.java:179)
at org.apache.kafka.streams.state.internals.CachingKeyValueStore.putAndMaybeForward(CachingKeyValueStore.java:107)
at org.apache.kafka.streams.state.internals.CachingKeyValueStore.lambda$initInternal$0(CachingKeyValueStore.java:87)
at org.apache.kafka.streams.state.internals.NamedCache.flush(NamedCache.java:151)
at org.apache.kafka.streams.state.internals.NamedCache.flush(NamedCache.java:109)
at org.apache.kafka.streams.state.internals.ThreadCache.flush(ThreadCache.java:136)
at org.apache.kafka.streams.state.internals.CachingKeyValueStore.flushCache(CachingKeyValueStore.java:345)
at org.apache.kafka.streams.state.internals.WrappedStateStore.flushCache(WrappedStateStore.java:71)
at org.apache.kafka.streams.processor.internals.ProcessorStateManager.flushCache(ProcessorStateManager.java:487)
at org.apache.kafka.streams.processor.internals.StreamTask.prepareCommit(StreamTask.java:402)
at org.apache.kafka.streams.processor.internals.TaskManager.commitAndFillInConsumedOffsetsAndMetadataPerTaskMap(TaskManager.java:1043)
at org.apache.kafka.streams.processor.internals.TaskManager.commit(TaskManager.java:1016)
at org.apache.kafka.streams.processor.internals.StreamThread.maybeCommit(StreamThread.java:1017)
at org.apache.kafka.streams.processor.internals.StreamThread.runOnce(StreamThread.java:786)
at org.apache.kafka.streams.processor.internals.StreamThread.runLoop(StreamThread.java:583)
at org.apache.kafka.streams.processor.internals.StreamThread.run(StreamThread.java:555)
Caused by: java.lang.ClassCastException: class MyProjectTopology$MapChunkData cannot be cast to class java.lang.String (MyProject.processor.MyProject$MapChunkData is in unnamed module of loader 'app'; java.lang.String is in module java.base of loader 'bootstrap')
at org.apache.kafka.common.serialization.StringSerializer.serialize(StringSerializer.java:29)
at org.apache.kafka.streams.kstream.internals.foreignkeyjoin.ForeignJoinSubscriptionSendProcessorSupplier$UnbindChangeProcessor.process(ForeignJoinSubscriptionSendProcessorSupplier.java:99)
at org.apache.kafka.streams.kstream.internals.foreignkeyjoin.ForeignJoinSubscriptionSendProcessorSupplier$UnbindChangeProcessor.process(ForeignJoinSubscriptionSendProcessorSupplier.java:69)
at org.apache.kafka.streams.processor.internals.ProcessorAdapter.process(ProcessorAdapter.java:71)
at org.apache.kafka.streams.processor.internals.ProcessorNode.process(ProcessorNode.java:146)
... 30 common frames omitted

版本

  • Kotlin 1.6.10
  • 卡夫卡流 3.0.0
  • Kotlinx 序列化 1.3.2

有點出乎意料,我在拓撲定義中犯了一個錯誤。

在創建其中一個表的最后階段,我映射了值 - 但我沒有指定 serdes。

    .mapValues { _, v ->
      ChunkTilesAndProtos(v.tiles, v.protos)
    }

所以我將其更改為指定 serdes。

    .mapValues(
      "finalise-web-map-tile-chunk-aggregation",
      materializedAs("web-map-tile-chunks", jsonMapper.serde(), jsonMapper.serde())
    ) { _, v ->
      ChunkTilesAndProtos(v.tiles, v.protos)
    }
// note: this uses extension functions from github.com/adamko-dev/kotka-streams 

找到這個並不容易。 我通過在AbstractStream.java (以及其他構造函數)的構造函數中放置一個斷點來查看keySerdevalueSerde字段何時設置。

有時會出現 null serde(我認為某些 KTables/KStreams 是“虛擬的”,不會對 Kafka 主題進行編碼/解碼)。 但是我能夠找到導致我的問題的操作,並定義 serdes,因為我正在更改值類型。

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM