![](/img/trans.png)
[英]Does Kafka Streams GlobalKTable topic require the same number of partitions as KStream topic which it will be joining with?
[英]Kafka KStream to GlobalKTable join does not work with same key used
我有一個非常令人沮喪的問題,嘗試將由使用KafkaProducer的Java驅動程序填充的KStream合並到從Topic填充的GlobalKTable中,該主題又由使用JDBCConnector從MySQL Table提取數據的填充。 無論我做什么嘗試,都必須將KStream和GlobalKTable的鍵值都設置為相同的值,否則將無法正常工作。 我的意思是從未調用過ValueJoiner。 我將在下面顯示相關的配置和代碼來嘗試解釋。 感謝您的幫助。
我正在使用最新版本的融合平台。
填充GlobalKTable的主題來自單個MySQL表:
Column Name/Type:
pk/bigint(20)
org_name/varchar(255)
orgId/varchar(10)
為此,JDBCConnector配置為:
name=my-demo
connector.class=io.confluent.connect.jdbc.JdbcSourceConnector
key.converter=io.confluent.connect.avro.AvroConverter
key.converter.schema.registry.url=http://localhost:8081
value.converter=io.confluent.connect.avro.AvroConverter
value.converter.schema.registry.url=http://localhost:8081
connection.url=jdbc:mysql://localhost:3306/reporting?user=root&password=XXX
table.whitelist=organisation
mode=incrementing
incrementing.column.name=pk
topic.prefix=my-
transforms=keyaddition
transforms.keyaddition.type=org.apache.kafka.connect.transforms.ValueToKey
transforms.keyaddition.fields=orgId
我正在使用命令行運行JDBC連接器:
connect-standalone /home/jim/platform/confluent/etc/schema-registry/connect-avro-standalone.properties /home/jim/prg/kafka/config/my.mysql.properties
這給了我一個名為my-organisation的主題,到目前為止,它的主題是orgId .....! (請注意,名稱空間似乎不是由JDBCConnector設置的,但我認為這不是問題,但我不確定)
現在,代碼。 這是初始化和創建GlobalKTable的方法(顯示了相關代碼):
final Map<String, String> serdeConfig =
Collections.singletonMap(AbstractKafkaAvroSerDeConfig.SCHEMA_REGISTRY_URL_CONFIG,
schemaRegistryUrl);
final StreamsBuilder builder = new StreamsBuilder();
final SpecificAvroSerde<Organisation> orgSerde = new SpecificAvroSerde<>();
orgSerde.configure(serdeConfig, false);
// Create the GlobalKTable from the topic that was populated using the connect-standalone command line
final GlobalKTable<String, Organisation>
orgs =
builder.globalTable(ORG_TOPIC, Materialized.<String, Organisation, KeyValueStore<Bytes, byte[]>>as(ORG_STORE)
.withKeySerde(Serdes.String())
.withValueSerde(orgSerde));
從中生成Organisaton類的avro模式定義為:
{"namespace": "io.confluent.examples.streams.avro",
"type":"record",
"name":"Organisation",
"fields":[
{"name": "pk", "type":"long"},
{"name": "org_name", "type":"string"},
{"name": "orgId", "type":"string"}
]
}
注意:如上所述,使用單消息轉換(SMT)操作將orgId設置為主題的鍵。
因此,這就是GlobalKTable設置。
現在進行KStream設置(連接的右側)。 它具有與globalKTable相同的鍵(orgId)。 為此,我使用了一個簡單的驅動程序:
(用例是該主題將包含與每個組織相關的事件)
public class UploadGenerator {
public static void main(String[] args){
Properties props = new Properties();
props.put(ProducerConfig.BOOTSTRAP_SERVERS_CONFIG, "localhost:9092");
props.put(ProducerConfig.KEY_SERIALIZER_CLASS_CONFIG,
io.confluent.kafka.serializers.KafkaAvroSerializer.class);
props.put(ProducerConfig.VALUE_SERIALIZER_CLASS_CONFIG,
io.confluent.kafka.serializers.KafkaAvroSerializer.class);
props.put("schema.registry.url", "http://localhost:8081");
KafkaProducer producer = new KafkaProducer(props);
// This schema is also used in the consumer application or more specifically a class generated from it.
String mySchema = "{\"namespace\": \"io.confluent.examples.streams.avro\"," +
"\"type\":\"record\"," +
"\"name\":\"DocumentUpload\"," +
"\"fields\":[{\"name\":\"orgId\",\"type\":\"string\"}," +
"{\"name\":\"date\",\"type\":\"long\",\"logicalType\":\"timestamp-millis\"}]}";
Schema.Parser parser = new Schema.Parser();
Schema schema = parser.parse(mySchema);
// Just using three fictional organisations with the following orgIds/keys
String[] ORG_ARRAY = {"002", "003", "004"};
long count = 0;
String key = ""; // key is the realm
while(true) {
count++;
try {
TimeUnit.SECONDS.sleep(5);
} catch (InterruptedException e) {
}
GenericRecord avroRecord = new GenericData.Record(schema);
int orgId = ThreadLocalRandom.current().nextInt(0, 2 + 1);
avroRecord.put("orgId",ORG_ARRAY[orgId]);
avroRecord.put("date",new Date().getTime());
key = ORG_ARRAY[orgId];
ProducerRecord<Object, Object> record = new ProducerRecord<>("topic_uploads", key, avroRecord);
try {
producer.send(record);
producer.flush();
} catch(SerializationException e) {
System.out.println("Exccccception was generated! + " + e.getMessage());
} catch(Exception el) {
System.out.println("Exception: " + el.getMessage());
}
}
}
}
因此,這將生成一個新事件,該事件代表由orgId代表的組織的上載,並且還專門在ProducerRecord中使用的鍵變量中進行設置。
以下是為這些事件設置KStream的代碼:
final SpecificAvroSerde<DocumentUpload> uploadSerde = new SpecificAvroSerde<>();
uploadSerde.configure(serdeConfig, false);
// Get the stream of uploads
final KStream<String, DocumentUpload> uploadStream = builder.stream(UPLOADS_TOPIC, Consumed.with(Serdes.String(), uploadSerde));
// Debug output to see the contents of the stream
uploadStream.foreach((k, v) -> System.out.println("uploadStream: Key: " + k + ", Value: " + v));
// Note, I tried to re-key the stream with the orgId field (even though it was set as the key in the driver but same problem)
final KStream<String, DocumentUpload> keyedUploadStream = uploadStream.selectKey((key, value) -> value.getOrgId());
keyedUploadStream.foreach((k, v) -> System.out.println("keyedUploadStream: Key: " + k + ", Value: " + v));
// Java 7 form used as it was easier to put in debug statements
// OrgPK is just a helper class defined in the same class
KStream<String, OrgPk> joined = keyedUploadStream.leftJoin(orgs,
new KeyValueMapper<String, DocumentUpload, String>() { /* derive a (potentially) new key by which to lookup against the table */
@Override
public String apply(String key, DocumentUpload value) {
System.out.println("1. The key passed in is: " + key);
System.out.println("2. The upload realm passed in is: " + value.getOrgId());
return value.getOrgId();
}
},
// THIS IS NEVER CALLED WITH A join() AND WHEN CALLED WITH A leftJoin() HAS A NULL ORGANISATION
new ValueJoiner<DocumentUpload, Organisation, OrgPk>() {
@Override
public OrgPk apply(DocumentUpload leftValue, Organisation rightValue) {
System.out.println("3. Value joiner has been called...");
if( null == rightValue ) {
// THIS IS ALWAYS CALLED, SO THERE IS NEVER A "MATCH"
System.out.println(" 3.1. Orgnisation is NULL");
return new OrgPk(leftValue.getRealm(), 1L);
}
System.out.println(" 3.1. Org is OK");
// Never reaches here - this is the issue i.e. there is never a match
return new OrgPk(leftValue.getOrgId(), rightValue.getPk());
}
});
因此,即使兩個鍵相同,上述聯接(或leftJoin)也永遠不會匹配! 這是主要問題。
最后,DocumentUpload的avro模式為:
{"namespace": "io.confluent.examples.streams.avro",
"type":"record",
"name":"DocumentUpload",
"fields":[
{"name": "orgId", "type":"string"},
{"name":"date", "type":"long", "logicalType":"timestamp-millis"}
]
}
因此,總而言之:
有人能幫我嗎? 我正在拔頭發試圖解決這個問題。
通過提供狀態目錄配置StreamsConfig.STATE_DIR_CONFIG,我能夠在Windows / Intellij上解決此問題。
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.