簡體   English   中英

Spark Streaming HUDI HoodieException: Config conflict(key current value existing value): RecordKey:

[英]Spark Streaming HUDI HoodieException: Config conflict(key current value existing value): RecordKey:

當我使用 spark 連接到 kafka 主題並創建 dataframe 然后存儲到 Hudi 中時:

df
.selectExpr("key", "topic", "partition", "offset", "timestamp", "timestampType", "CAST(key AS STRING)", "CAST(value AS STRING)")
.writeStream
.format("hudi")
.options(getQuickstartWriteConfigs)
.option(PRECOMBINE_FIELD.key(), "essDateTime")
.option("hoodie.datasource.write.keygenerator.class","org.apache.hudi.keygen.ComplexKeyGenerator")
.option(RECORDKEY_FIELD.key(), "offset,timestamp")//"offset,essDateTime")
.option(TBL_NAME.key, streamingTableName)
.option("path", baseStreamingPath)
.trigger(ProcessingTime(10000))
.outputMode("append")
.option("checkpointLocation", checkpointLocation)
.start()

我收到以下異常:

9:43
ERROR] 2023-01-31 09:35:25.474 [stream execution thread for [id = 8b30fd4b-8506-490b-80ad-76868c14594f, runId = 25d34e6f-10e2-42c2-b094-654797f5d79c]] HoodieStreamingSink - Micro batch id=1 threw following exception:
org.apache.hudi.exception.HoodieException: Config conflict(key  current value   existing value):
RecordKey:  offset,timestamp    uuid
KeyGenerator:   org.apache.hudi.keygen.ComplexKeyGenerator  org.apache.hudi.keygen.SimpleKeyGenerator
    at org.apache.hudi.HoodieWriterUtils$.validateTableConfig(HoodieWriterUtils.scala:167) ~[hudi-spark3-bundle_2.12-0.12.2.jar:0.12.2]
    at org.apache.hudi.HoodieSparkSqlWriter$.write(HoodieSparkSqlWriter.scala:90) ~[hudi-spark3-bundle_2.12-0.12.2.jar:0.12.2]
    at org.apache.hudi.HoodieStreamingSink.$anonfun$addBatch$2(HoodieStreamingSink.scala:129) ~[hudi-spark3-bundle_2.12-0.12.2.jar:0.12.2]
    at scala.util.Try$.apply(Try.scala:213) ~[scala-library-2.12.15.jar:?]
    at org.apache.hudi.HoodieStreamingSink.$anonfun$addBatch$1(HoodieStreamingSink.scala:128) ~[hudi-spark3-bundle_2.12-0.12.2.jar:0.12.2]
    at org.apache.hudi.HoodieStreamingSink.retry(HoodieStreamingSink.scala:214) ~[hudi-spark3-bundle_2.12-0.12.2.jar:0.12.2]
    at org.apache.hudi.HoodieStreamingSink.addBatch(HoodieStreamingSink.scala:127) ~[hudi-spark3-bundle_2.12-0.12.2.jar:0.12.2]
    at org.apache.spark.sql.execution.streaming.MicroBatchExecution.$anonfun$runBatch$17(MicroBatchExecution.scala:666) ~[spark-sql_2.12-3.3.1.jar:3.3.1]

將所有kafka數據存儲到Hudi表中

在 apache Hudi 中,有一些配置是你不能覆蓋的,比如KeyGenerator 看來您已經使用org.apache.hudi.keygen.SimpleKeyGenerator表,因此您需要重新創建表以更改此配置和分區鍵。

如果想快速測試,可以更改baseStreamingPath ,將數據寫入到新的Hudi表中。

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM