![](/img/trans.png)
[英]How to join two Kafka streams and produce the result in a topic with Avro values
[英]Kafka streams joins two specific Avro objects
基本任務
我在 Avro 格式的 Kafka 中有 2 個相同的流。 我正在嘗試對這兩個流進行基本的左連接。
鑰匙
對於這兩個主題中的鍵,我使用四舍五入到毫秒的時間戳,因為兩個 stream 都有來自 IoT 設備的數據,該設備每 20 毫秒生成一次測量,並且兩個設備現在都同步到 UTC 時間。
到目前為止完成
我已經能夠生成 Kafka stream,它將一個 stream 轉換為本教程之后的主題,但不幸的是,Confluent 開發人員頁面上不存在基本的流-流連接教程。
Avro Java 序列化類我已經根據 2 個輸入和 output 生成了 3 個SpecificAvroSerde
類。雖然輸入流是相同的,但我已經創建了單獨的模式/類,以防將來流有不同的模式。 Avro Java 類是在構建時產生的 whiteout 問題。
所以這是輸入模式,output 和加入 stream:
{
"namespace": "pmu.serialization.avro",
"name": "RawPMU_214",
"type": "record",
"fields": [
{"name": "pmu_id", "type": "int"},
{"name": "time", "type":"string"},
{"name": "time_rounded", "type":"string"},
{"name": "stream_id","type":"int"},
{"name": "stat", "type":"string"},
{"name": "ph_i1_r","type":"float"},
{"name": "ph_i1_j","type":"float"},
{"name": "ph_i2_r","type":"float"},
{"name": "ph_i2_j","type":"float"},
{"name": "ph_i3_r","type":"float"},
{"name": "ph_i3_j","type":"float"},
{"name": "ph_v4_r","type":"float"},
{"name": "ph_v4_j","type":"float"},
{"name": "ph_v5_r","type":"float"},
{"name": "ph_v5_j","type":"float"},
{"name": "ph_v6_r","type":"float"},
{"name": "ph_v6_j","type":"float"},
{"name": "ph_7_r","type":"float"},
{"name": "ph_7_j","type":"float"},
{"name": "ph_8_r","type":"float"},
{"name": "ph_8_j","type":"float"},
{"name": "analog","type":"string"},
{"name": "digital","type":"string"},
{"name": "frequency","type":"float"},
{"name": "rocof","type":"int"},
{"name": "orderCount","type":"int"}
]
}
代碼
關鍵問題是我不知道如何使用值連接器正確實現這部分:
KStream<String, RawPMU_Joined> joinedPMU = rawPMUs_214.join(rawPMUs_218,
(leftValue, rightValue) -> "left=" + leftValue + ", right=" + rightValue, /* ValueJoiner */
我嘗試了各種答案,但我還沒有真正找到SpecificAvroSerde
的流-流連接的完整 Java 代碼的任何示例。
此時的完整代碼:
package io.confluent.developer;
import io.confluent.kafka.streams.serdes.avro.SpecificAvroSerde;
import org.apache.kafka.clients.admin.AdminClient;
import org.apache.kafka.clients.admin.NewTopic;
import org.apache.kafka.common.serialization.Serdes;
import org.apache.kafka.streams.*;
import org.apache.kafka.streams.kstream.JoinWindows;
import org.apache.kafka.streams.kstream.Joined;
import org.apache.kafka.streams.kstream.KStream;
import org.apache.kafka.streams.kstream.Produced;
import pmu.serialization.avro.RawPMU_214;
import pmu.serialization.avro.RawPMU_218;
import pmu.serialization.avro.RawPMU_Joined;
import java.time.Duration;
import java.io.FileInputStream;
import java.io.IOException;
import java.util.ArrayList;
import java.util.List;
import java.util.Map;
import java.util.Properties;
import java.util.concurrent.CountDownLatch;
public class TransformStream_join {
public Topology buildTopology(Properties allProps) {
final StreamsBuilder builder = new StreamsBuilder();
// Define input PMU topics
final String inputPMU_01 = allProps.getProperty("input.topic.pmu1");
final String inputPMU_02 = allProps.getProperty("input.topic.pmu1");
final String outputTopic = allProps.getProperty("output.topic.name");
KStream<String, RawPMU_214> rawPMUs_214 = builder.stream(inputPMU_01);
KStream<String, RawPMU_218> rawPMUs_218 = builder.stream(inputPMU_02);
KStream<String, RawPMU_Joined> joinedPMU = rawPMUs_214.join(rawPMUs_218,
(leftValue, rightValue) -> "left=" + leftValue + ", right=" + rightValue, /* ValueJoiner */
JoinWindows.of(Duration.ofMillis(20)),
Joined.with(
Serdes.String(),
raw_pmu214AvroSerde(allProps),
raw_pmu218AvroSerde(allProps))
);
joinedPMU.to(outputTopic, Produced.with(Serdes.String(), raw_outAvroSerde(allProps)));
return builder.build();
}
private SpecificAvroSerde<RawPMU_214> raw_pmu214AvroSerde(Properties allProps) {
SpecificAvroSerde<RawPMU_214> raw_pmu214AvroSerde = new SpecificAvroSerde<>();
raw_pmu214AvroSerde.configure((Map)allProps, false);
return raw_pmu214AvroSerde;
}
private SpecificAvroSerde<RawPMU_218> raw_pmu218AvroSerde(Properties allProps) {
SpecificAvroSerde<RawPMU_218> raw_pmu218AvroSerde = new SpecificAvroSerde<>();
raw_pmu218AvroSerde.configure((Map)allProps, false);
return raw_pmu218AvroSerde;
}
private SpecificAvroSerde<RawPMU_Joined> raw_outAvroSerde(Properties allProps) {
SpecificAvroSerde<RawPMU_Joined> raw_outAvroSerde = new SpecificAvroSerde<>();
raw_outAvroSerde.configure((Map)allProps, false);
return raw_outAvroSerde;
}
public void createTopics(Properties allProps) {
AdminClient client = AdminClient.create(allProps);
List<NewTopic> topics = new ArrayList<>();
topics.add(new NewTopic(
allProps.getProperty("input.topic.pmu1"),
Integer.parseInt(allProps.getProperty("input.topic.partitions")),
Short.parseShort(allProps.getProperty("input.topic.replication.factor"))));
topics.add(new NewTopic(
allProps.getProperty("input.topic.pmu2"),
Integer.parseInt(allProps.getProperty("input.topic.partitions")),
Short.parseShort(allProps.getProperty("input.topic.replication.factor"))));
topics.add(new NewTopic(
allProps.getProperty("output.topic.name"),
Integer.parseInt(allProps.getProperty("output.topic.partitions")),
Short.parseShort(allProps.getProperty("output.topic.replication.factor"))));
client.createTopics(topics);
client.close();
}
public Properties loadEnvProperties(String fileName) throws IOException {
Properties allProps = new Properties();
FileInputStream input = new FileInputStream(fileName);
allProps.load(input);
input.close();
return allProps;
}
public static void main(String[] args) throws Exception {
if (args.length < 1) {
throw new IllegalArgumentException("This program takes one argument: the path to an environment configuration file.");
}
TransformStream ts = new TransformStream();
Properties allProps = ts.loadEnvProperties(args[0]);
allProps.put(StreamsConfig.DEFAULT_KEY_SERDE_CLASS_CONFIG, Serdes.String().getClass());
allProps.put(StreamsConfig.DEFAULT_VALUE_SERDE_CLASS_CONFIG, SpecificAvroSerde.class);
Topology topology = ts.buildTopology(allProps);
ts.createTopics(allProps);
final KafkaStreams streams = new KafkaStreams(topology, allProps);
final CountDownLatch latch = new CountDownLatch(1);
// Attach shutdown handler to catch Control-C.
Runtime.getRuntime().addShutdownHook(new Thread("streams-shutdown-hook") {
@Override
public void run() {
streams.close(Duration.ofSeconds(5));
latch.countDown();
}
});
try {
streams.start();
latch.await();
} catch (Throwable e) {
System.exit(1);
}
System.exit(0);
}
}
KStream join:我已經簡化了加入 stream 的代碼,因為我已經創建了 joiner class
KStream<String, RawPMU_Joined> joinedPMU = pmu214Stream.join(pmu218Stream, pmuJoiner,
JoinWindows.of(Duration.ofMillis(20)),
Joined.with(
Serdes.String(),
raw_pmu214AvroSerde(allProps),
raw_pmu218AvroSerde(allProps))
);
PMUJoiner class
package io.confluent.developer;
import org.apache.kafka.streams.kstream.ValueJoiner;
import pmu.serialization.avro.RawPMU_214;
import pmu.serialization.avro.RawPMU_218;
import pmu.serialization.avro.RawPMU_Joined;
public class PMUJoiner implements ValueJoiner<RawPMU_218, RawPMU_214, RawPMU_Joined> {
public RawPMU_Joined apply(RawPMU_218 pmu218Stream, RawPMU_214 pmu214Stream) {
return RawPMU_Joined.newBuilder()
// PMU 218
.setTimeRounded1(pmu218Stream.getTimeRounded())
.setOrderCount1(pmu218Stream.getOrderCount())
.setPhI1R1(pmu218Stream.getPhI1R())
.setPhI1J1(pmu218Stream.getPhI1J())
.setPhI2R1(pmu218Stream.getPhI2R())
.setPhI2J1(pmu218Stream.getPhI2J())
.setPhI3R1(pmu218Stream.getPhI3R())
.setPhI3J1(pmu218Stream.getPhI3J())
.setPhV4R1(pmu218Stream.getPhV4R())
.setPhV4J1(pmu218Stream.getPhV4J())
.setPhV5R1(pmu218Stream.getPhV5R())
.setPhV5J1(pmu218Stream.getPhV5J())
.setPhV6R1(pmu218Stream.getPhV6R())
.setPhV6J1(pmu218Stream.getPhV6J())
.setPh7R1(pmu218Stream.getPh7R())
.setPh7J1(pmu218Stream.getPh7J())
.setPh8R1(pmu218Stream.getPh8R())
.setPh8J1(pmu218Stream.getPh8J())
//PMU 214
.setTimeRounded2(pmu214Stream.getTimeRounded())
.setOrderCount2(pmu214Stream.getOrderCount())
.setPhI1R2(pmu214Stream.getPhI1R())
.setPhI1J2(pmu214Stream.getPhI1J())
.setPhI2R2(pmu214Stream.getPhI2R())
.setPhI2J2(pmu214Stream.getPhI2J())
.setPhI3R2(pmu214Stream.getPhI3R())
.setPhI3J2(pmu214Stream.getPhI3J())
.setPhV4R2(pmu214Stream.getPhV4R())
.setPhV4J2(pmu214Stream.getPhV4J())
.setPhV5R2(pmu214Stream.getPhV5R())
.setPhV5J2(pmu214Stream.getPhV5J())
.setPhV6R2(pmu214Stream.getPhV6R())
.setPhV6J2(pmu214Stream.getPhV6J())
.setPh7R2(pmu214Stream.getPh7R())
.setPh7J2(pmu214Stream.getPh7J())
.setPh8R2(pmu214Stream.getPh8R())
.setPh8J2(pmu214Stream.getPh8J())
.build();
}
}
錯誤
...pmuStream01/src/main/java/io/confluent/developer/JoinPMUStreams.java:46: error: 找不到適合加入的方法(org.apache.kafka.streams.kstream.KStream<java.lang.String, pmu.serialization.avro.RawPMU_218>,io.confluent.developer.PMUJoiner,org.apache.kafka.streams.kstream.JoinWindows,org.apache.kafka.streams.kstream.Joined<java.lang.String,pmu.serialization .avro.RawPMU_214,pmu.serialization.avro.RawPMU_218>) KStream<String, RawPMU_Joined> joinedPMU = pmu214Stream.join(pmu218Stream, ^ method org.apache.kafka.streams.kstream.KStream.<VO,VR>join(org .apache.kafka.streams.kstream.KStream<java.lang.String,VO>,org.apache.kafka.streams.kstream.ValueJoiner<? super pmu.serialization.avro.RawPMU_214,? super VO,? extends VR> ,org.apache.kafka.streams.kstream.JoinWindows) 不適用(無法推斷類型變量 VO,VR(實際和形式參數列表的長度不同))方法 org.apache.kafka.streams.kst ream.KStream.<VO,VR>join(org.apache.kafka.streams.kstream.KStream<java.lang.String,VO>,org.apache.kafka.streams.kstream.ValueJoiner<? 超級 pmu.serialization.avro.RawPMU_214,? 超級旁白,? extends VR>,org.apache.kafka.streams.kstream.JoinWindows,org.apache.kafka.streams.kstream.Joined<java.lang.String,pmu.serialization.avro.RawPMU_214,VO>)不適用(不能推斷類型變量 VO,VR(參數不匹配;io.confluent.developer.PMUJoiner 無法轉換為 org.apache.kafka.streams.kstream.ValueJoiner<? super pmu.serialization.avro.RawPMU_214,? super VO ,? extends VR>)) 方法 org.apache.kafka.streams.kstream.KStream.<VO,VR>join(org.apache.kafka.streams.kstream.KStream<java.lang.String,VO>,org. apache.kafka.streams.kstream.ValueJoiner<? super pmu.serialization.avro.RawPMU_214,? super VO,? extends VR>,org.apache.kafka.streams.kstream.JoinWindows,org.apache.kafka.streams.kstream .StreamJoined<java.lang.String,pmu.serialization.avro.RawPMU_214,VO>) 不適用(無法推斷類型變量)VO,VR(參數不匹配;io.confluent.developer.PMUJoiner 無法轉換為組織。 apache.kafka.streams.kstream.ValueJoiner<? 超級 pmu.serialization.avro.RawPMU_214,? 超級旁白,? 擴展 VR>)) 方法 org.apache.kafka.streams.kstream.KStream.<VT,VR>join(org.apache.kafka.streams.kstream.KTable<java.lang.String,VT>,org.apache. kafka.streams.kstream.ValueJoiner<? super pmu.serialization.avro.RawPMU_214,? super VT,? extends VR>) 不適用(無法推斷類型變量)VT,VR(實際和形式參數列表不同長度))方法 org.apache.kafka.streams.kstream.KStream.<VT,VR>join(org.apache.kafka.streams.kstream.KTable<java.lang.String,VT>,org.apache.kafka. streams.kstream.ValueJoiner<? super pmu.serialization.avro.RawPMU_214,? super VT,? extends VR>,org.apache.kafka.streams.kstream.Joined<java.lang.String,pmu.serialization.avro.RawPMU_214 ,VT>) 不適用(無法推斷類型變量 VT,VR(實際和形式參數列表的長度不同))方法 org.apache.kafka.streams.kstream.KStream.<GK,GV,RV>加入(org.apache.kafka.streams.kstream.GlobalKTable<GK,GV>,org.apache。 kafka.streams.kstream.KeyValueMapper<? 超級 java.lang.String,? 超級 pmu.serialization.avro.RawPMU_214,? 擴展 GK>,org.apache.kafka.streams.kstream.ValueJoiner<? 超級 pmu.serialization.avro.RawPMU_214,? 超級GV,? extends RV>) 不適用(無法推斷類型變量 GK、GV、RV(實際和形式參數列表的長度不同))方法 org.apache.kafka.streams.kstream.KStream.<GK,GV, RV>加入(org.apache.kafka.streams.kstream.GlobalKTable<GK,GV>,org.apache.kafka.streams.kstream.KeyValueMapper<? super java.lang.String,? super pmu.serialization.avro.RawPMU_214 ,? extends GK>,org.apache.kafka.streams.kstream.ValueJoiner<? super pmu.serialization.avro.RawPMU_214,? super GV,? extends RV>,org.apache.kafka.streams.kstream.Named) 是不適用(無法推斷類型變量)GK、GV、RV(參數不匹配;org.apache.kafka.streams.kstream.KStream<java.lang.String,pmu.serialization.avro.RawPMU_218> 無法轉換為org.apache.kafka.streams.kstream.GlobalKTable<GK,GV>))
不知道為什么會這樣,因為我相信我已經正確地為所有 arguments 提供了正確的返回類型。
我建議從這個開始 - 一個連接器 function 接受兩個 Avro 對象並返回第三個(可選,Avro)對象。
(leftValue, rightValue) -> {
RawPMU_Joined j = new RawPMU_Joined();
j.set...
return j;
}
您可以在 Github 上的 confluent-examples 存儲庫中查看通用的 Avro 示例; 特定記錄不需要一個,因為它只是您要返回的一個不同的 object,但它不會是一個字符串。
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.