[英]Reading peek topic from kafka streams
我有一個主題名稱,即push-processing-KSTREAM-PEEK-0000000014-repartition
,這是 kafka 的內部主題。 我沒有創建這個主題,我在重新分區后使用.peek()
方法並使用 peek 方法 3-4 次。
我的問題是我可以從 topic topic read push-processing-KSTREAM-PEEK-0000000014-repartition
閱讀,但是當我說topic read push-processing-KSTREAM-PEEK-0000000014-repartition --from-beginning
時我無法閱讀。
這個內部話題是因為peek
方法而創建的,對吧?
還是它與其他重新分區流代碼有關,但它的名稱是KSTREEAM-PEEK
?
它有 50 個分區。 由於peek
是無狀態操作,它不應該創建內部主題,但為什么它的名稱與peek
相關,為什么我不能從頭開始閱讀?
請有任何想法/
這是第一個拓撲:
Sub-topology: 0
Source: KSTREAM-SOURCE-0000000000 (topics: [appconnect_deviceIds_exported_for_push])
--> KSTREAM-FLATMAP-0000000004
Processor: KSTREAM-FLATMAP-0000000004 (stores: [])
--> KSTREAM-PEEK-0000000005
<-- KSTREAM-SOURCE-0000000000
Processor: KSTREAM-PEEK-0000000005 (stores: [])
--> KSTREAM-FILTER-0000000007
<-- KSTREAM-FLATMAP-0000000004
Processor: KSTREAM-FILTER-0000000007 (stores: [])
--> KSTREAM-SINK-0000000006
<-- KSTREAM-PEEK-0000000005
Sink: KSTREAM-SINK-0000000006 (topic: KSTREAM-PEEK-0000000005-repartition)
<-- KSTREAM-FILTER-0000000007
Sub-topology: 1
Source: KSTREAM-SOURCE-0000000008 (topics: [KSTREAM-PEEK-0000000005-repartition])
--> KSTREAM-JOIN-0000000009
Source: KSTREAM-SOURCE-0000000028 (topics: [KSTREAM-PEEK-0000000025-repartition])
--> KSTREAM-JOIN-0000000029
Processor: KSTREAM-JOIN-0000000009 (stores: [appconnect_device_stream-STATE-STORE-0000000001])
--> KSTREAM-MAP-0000000010
<-- KSTREAM-SOURCE-0000000008
Processor: KSTREAM-JOIN-0000000029 (stores: [appconnect_device_stream-STATE-STORE-0000000001])
--> KSTREAM-PEEK-0000000030
<-- KSTREAM-SOURCE-0000000028
Processor: KSTREAM-MAP-0000000010 (stores: [])
--> KSTREAM-PEEK-0000000011
<-- KSTREAM-JOIN-0000000009
Processor: KSTREAM-PEEK-0000000030 (stores: [])
--> KSTREAM-MAP-0000000031
<-- KSTREAM-JOIN-0000000029
Processor: KSTREAM-MAP-0000000031 (stores: [])
--> KSTREAM-SINK-0000000032
<-- KSTREAM-PEEK-0000000030
Processor: KSTREAM-PEEK-0000000011 (stores: [])
--> KSTREAM-SINK-0000000012
<-- KSTREAM-MAP-0000000010
Source: KSTREAM-SOURCE-0000000002 (topics: [appconnect_device_stream])
--> KTABLE-SOURCE-0000000003
Sink: KSTREAM-SINK-0000000012 (topic: appconnect_devices_exported_for_push)
<-- KSTREAM-PEEK-0000000011
Sink: KSTREAM-SINK-0000000032 (topic: appconnect_devices_exported_for_push)
<-- KSTREAM-MAP-0000000031
Processor: KTABLE-SOURCE-0000000003 (stores: [appconnect_device_stream-STATE-STORE-0000000001])
--> none
<-- KSTREAM-SOURCE-0000000002
Sub-topology: 2
Source: KSTREAM-SOURCE-0000000013 (topics: [appconnect_userIds_exported_for_push])
--> KSTREAM-FLATMAP-0000000017
Processor: KSTREAM-FLATMAP-0000000017 (stores: [])
--> KSTREAM-PEEK-0000000018
<-- KSTREAM-SOURCE-0000000013
Processor: KSTREAM-PEEK-0000000018 (stores: [])
--> KSTREAM-FILTER-0000000020
<-- KSTREAM-FLATMAP-0000000017
Processor: KSTREAM-FILTER-0000000020 (stores: [])
--> KSTREAM-SINK-0000000019
<-- KSTREAM-PEEK-0000000018
Sink: KSTREAM-SINK-0000000019 (topic: KSTREAM-PEEK-0000000018-repartition)
<-- KSTREAM-FILTER-0000000020
Sub-topology: 3
Source: KSTREAM-SOURCE-0000000021 (topics: [KSTREAM-PEEK-0000000018-repartition])
--> KSTREAM-JOIN-0000000022
Processor: KSTREAM-JOIN-0000000022 (stores: [appconnect_user_stream-STATE-STORE-0000000014])
--> KSTREAM-PEEK-0000000023
<-- KSTREAM-SOURCE-0000000021
Processor: KSTREAM-PEEK-0000000023 (stores: [])
--> KSTREAM-MAP-0000000024
<-- KSTREAM-JOIN-0000000022
Processor: KSTREAM-MAP-0000000024 (stores: [])
--> KSTREAM-PEEK-0000000025
<-- KSTREAM-PEEK-0000000023
Processor: KSTREAM-PEEK-0000000025 (stores: [])
--> KSTREAM-FILTER-0000000027
<-- KSTREAM-MAP-0000000024
Processor: KSTREAM-FILTER-0000000027 (stores: [])
--> KSTREAM-SINK-0000000026
<-- KSTREAM-PEEK-0000000025
Source: KSTREAM-SOURCE-0000000015 (topics: [appconnect_user_stream])
--> KTABLE-SOURCE-0000000016
Sink: KSTREAM-SINK-0000000026 (topic: KSTREAM-PEEK-0000000025-repartition)
<-- KSTREAM-FILTER-0000000027
Processor: KTABLE-SOURCE-0000000016 (stores: [appconnect_user_stream-STATE-STORE-0000000014])
--> none
<-- KSTREAM-SOURCE-0000000015
也就是第二步,
Sub-topology: 0
Source: KSTREAM-SOURCE-0000000017 (topics: [KSTREAM-PEEK-0000000014-repartition])
--> KSTREAM-JOIN-0000000018
Processor: KSTREAM-JOIN-0000000018 (stores: [appconnect_push_processing_submissions-STATE-STORE-0000000000])
--> KSTREAM-FILTER-0000000019
<-- KSTREAM-SOURCE-0000000017
Processor: KSTREAM-FILTER-0000000019 (stores: [])
--> KSTREAM-SINK-0000000020
<-- KSTREAM-JOIN-0000000018
Source: KSTREAM-SOURCE-0000000001 (topics: [appconnect_push_processing_submissions])
--> KTABLE-SOURCE-0000000002
Sink: KSTREAM-SINK-0000000020 (topic: appconnect_push_send_bulk)
<-- KSTREAM-FILTER-0000000019
Processor: KTABLE-SOURCE-0000000002 (stores: [appconnect_push_processing_submissions-STATE-STORE-0000000000])
--> none
<-- KSTREAM-SOURCE-0000000001
Sub-topology: 1
Source: KSTREAM-SOURCE-0000000003 (topics: [appconnect_devices_exported_for_push])
--> KSTREAM-MAP-0000000007
Processor: KSTREAM-MAP-0000000007 (stores: [])
--> KSTREAM-PEEK-0000000008
<-- KSTREAM-SOURCE-0000000003
Processor: KSTREAM-PEEK-0000000008 (stores: [])
--> KSTREAM-FILTER-0000000010
<-- KSTREAM-MAP-0000000007
Processor: KSTREAM-FILTER-0000000010 (stores: [])
--> KSTREAM-SINK-0000000009
<-- KSTREAM-PEEK-0000000008
Sink: KSTREAM-SINK-0000000009 (topic: KSTREAM-PEEK-0000000008-repartition)
<-- KSTREAM-FILTER-0000000010
Sub-topology: 2
Source: KSTREAM-SOURCE-0000000011 (topics: [KSTREAM-PEEK-0000000008-repartition])
--> KSTREAM-LEFTJOIN-0000000012
Processor: KSTREAM-LEFTJOIN-0000000012 (stores: [appconnect_user_stream-STATE-STORE-0000000004])
--> KSTREAM-KEY-SELECT-0000000013
<-- KSTREAM-SOURCE-0000000011
Processor: KSTREAM-KEY-SELECT-0000000013 (stores: [])
--> KSTREAM-PEEK-0000000014
<-- KSTREAM-LEFTJOIN-0000000012
Processor: KSTREAM-PEEK-0000000014 (stores: [])
--> KSTREAM-FILTER-0000000016
<-- KSTREAM-KEY-SELECT-0000000013
Processor: KSTREAM-FILTER-0000000016 (stores: [])
--> KSTREAM-SINK-0000000015
<-- KSTREAM-PEEK-0000000014
Source: KSTREAM-SOURCE-0000000005 (topics: [appconnect_user_stream])
--> KTABLE-SOURCE-0000000006
Sink: KSTREAM-SINK-0000000015 (topic: KSTREAM-PEEK-0000000014-repartition)
<-- KSTREAM-FILTER-0000000016
Processor: KTABLE-SOURCE-0000000006 (stores: [appconnect_user_stream-STATE-STORE-0000000004])
--> none
<-- KSTREAM-SOURCE-0000000005
我所有的這些操作都使用相同的 KEY。 對於所有主題,我有 5 個代理和 50 個分區。 我有 10 個並發,我將我的應用程序擴展到 5。但是就像我說的那樣,我正在對同一個鍵進行重新分區和傳輸 3-4 次數據。 這意味着我的所有值都與 flatMap、map 操作 go 相關到同一個分區。 1 或 2 次我使用不同的密鑰,因此消息分發到不同的分區,只有 1-2 次。 這會影響我的表現嗎? 或者我絕對應該分布在不同的分區上以提高我的性能。
因此,基本上 kafka 在僅使用主題之間的分區執行 3-4 次連接或重新分區操作時表現出更好的性能,因為 kafka 將僅從一個分區讀取,並且實際上知道在哪里讀取並立即讀取所有數據,因為磁盤上物理並行的數據(我的意思是 ssd 或 hdd)。 或者我的第二種情況; 我絕對應該使用更多的分區來並行讀取分區之間的數據嗎?
而且我還認為使用 peek 會減慢我的進程。
peek()
操作無關。 查看您發布的程序(部分)的拓撲描述如下:
KStream inputUser = builder.stream().flatMap().peek().filter();
KStream inputDevice = builder.stream().flatMap().peek().filter();
inputUser.join(inputDevice,...)
(如果您也將代碼發布在問題中會更容易)。
因為您調用flatMap()
Kafka Streams 假定您更改了密鑰,因此調用join()
會觸發數據重新分區。 重新分區主題名稱由上游 operatore 生成(我不是 100% 確定為什么選擇PEEK
而不是FILTER
是公平的。)
我所有的這些操作都使用相同的 KEY。
對於這種情況,您可能希望使用flatMapValues()
而不是flatMap()
。 對於這種情況,Kafka Streams 知道密鑰沒有改變,因此它不會創建重新分區主題。
同樣,如果鍵未更改以避免不必要的重新分區,您可能希望使用mapValues()
而不是map()
。
我的問題是我可以閱讀主題“主題閱讀推送處理-KSTREAM-PEEK-0000000014-repartition”,但當我說“主題閱讀推送處理-KSTREAM-PEEK-0000000014-repartition --from-beginning”時我無法閱讀"
我不確定你的意思是什么。 做什么
當我說“主題讀取 push-processing-KSTREAM-PEEK-0000000014-repartition --from-beginning”時
意思是? 你是指命令行工具bin/kafka-consumer.sh
嗎? 一般來說,是的,您可以閱讀重新分區主題,但我不確定為什么這會有用?
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.