Kafka KStream到GlobalKTable联接不适用于使用的相同密钥

Question

I have a very frustrating problem trying to join a KStream, that is populated by a java driver program using KafkaProducer, to a GlobalKTable that is populated from a Topic that, in turn, is populated using the JDBCConnector pulling data from a MySQL Table. 我有一个非常令人沮丧的问题，尝试将由使用KafkaProducer的Java驱动程序填充的KStream合并到从Topic填充的GlobalKTable中，该主题又由使用JDBCConnector从MySQL Table提取数据的填充。 No matter what I try to do the join between the KStream and the GlobalKTable, which both are keyed on the same value, will not work. 无论我做什么尝试，都必须将KStream和GlobalKTable的键值都设置为相同的值，否则将无法正常工作。 What I mean is that the ValueJoiner is never called. 我的意思是从未调用过ValueJoiner。 I'll try and explain by showing the relevant config and code below. 我将在下面显示相关的配置和代码来尝试解释。 I appreciate any help. 感谢您的帮助。

I am using the latest version of the confluent platform. 我正在使用最新版本的融合平台。

The topic that the GlobalKTable is populated from is pulled from a single MySQL table: 填充GlobalKTable的主题来自单个MySQL表：

Column Name/Type:
pk/bigint(20)
org_name/varchar(255)
orgId/varchar(10)

The JDBCConnector configuration for this is: 为此，JDBCConnector配置为：

name=my-demo
connector.class=io.confluent.connect.jdbc.JdbcSourceConnector
key.converter=io.confluent.connect.avro.AvroConverter
key.converter.schema.registry.url=http://localhost:8081
value.converter=io.confluent.connect.avro.AvroConverter
value.converter.schema.registry.url=http://localhost:8081
connection.url=jdbc:mysql://localhost:3306/reporting?user=root&password=XXX
table.whitelist=organisation
mode=incrementing
incrementing.column.name=pk
topic.prefix=my-
transforms=keyaddition
transforms.keyaddition.type=org.apache.kafka.connect.transforms.ValueToKey
transforms.keyaddition.fields=orgId

I am running the JDBC connector using the command line: 我正在使用命令行运行JDBC连接器：

connect-standalone /home/jim/platform/confluent/etc/schema-registry/connect-avro-standalone.properties /home/jim/prg/kafka/config/my.mysql.properties

This gives me a topic called my-organisation, that is keyed on orgId ..... so far so good! 这给了我一个名为my-organisation的主题，到目前为止，它的主题是orgId .....！ (note, the namespace does not seem to be set by JDBCConnector but I don't think this is an issue but I don't know for sure) （请注意，名称空间似乎不是由JDBCConnector设置的，但我认为这不是问题，但我不确定）

Now, the code. 现在，代码。 Here is how I initialise and create the GlobalKTable (relevant code shown): 这是初始化和创建GlobalKTable的方法（显示了相关代码）：

final Map<String, String> serdeConfig =
    Collections.singletonMap(AbstractKafkaAvroSerDeConfig.SCHEMA_REGISTRY_URL_CONFIG,
        schemaRegistryUrl);

final StreamsBuilder builder = new StreamsBuilder();

final SpecificAvroSerde<Organisation> orgSerde = new SpecificAvroSerde<>();
orgSerde.configure(serdeConfig, false);

// Create the GlobalKTable from the topic that was populated using the connect-standalone command line 
final GlobalKTable<String, Organisation>
   orgs =
   builder.globalTable(ORG_TOPIC, Materialized.<String, Organisation, KeyValueStore<Bytes, byte[]>>as(ORG_STORE)
           .withKeySerde(Serdes.String())
           .withValueSerde(orgSerde));

The avro schema, from where the Organisaton class is generated is defined as: 从中生成Organisaton类的avro模式定义为：

{"namespace": "io.confluent.examples.streams.avro",
 "type":"record",
 "name":"Organisation",
 "fields":[
    {"name": "pk",      "type":"long"},
    {"name": "org_name",   "type":"string"},
    {"name": "orgId",   "type":"string"}
  ]
}

Note: as described above the orgId is set as the key on the topic using the single message transform (SMT) operation. 注意：如上所述，使用单消息转换（SMT）操作将orgId设置为主题的键。

So, that is the GlobalKTable setup. 因此，这就是GlobalKTable设置。

Now for the KStream setup (the right hand side of the join). 现在进行KStream设置（连接的右侧）。 This has the same key (orgId) as the globalKTable. 它具有与globalKTable相同的键（orgId）。 I use a simple driver program for this: 为此，我使用了一个简单的驱动程序：

(The use case is that this topic will contain events associated with each organisation) （用例是该主题将包含与每个组织相关的事件）

public class UploadGenerator {

  public static void main(String[] args){
    Properties props = new Properties();
    props.put(ProducerConfig.BOOTSTRAP_SERVERS_CONFIG, "localhost:9092");
    props.put(ProducerConfig.KEY_SERIALIZER_CLASS_CONFIG,
          io.confluent.kafka.serializers.KafkaAvroSerializer.class);
    props.put(ProducerConfig.VALUE_SERIALIZER_CLASS_CONFIG,
          io.confluent.kafka.serializers.KafkaAvroSerializer.class);
    props.put("schema.registry.url", "http://localhost:8081");
KafkaProducer producer = new KafkaProducer(props);

// This schema is also used in the consumer application or more specifically a class generated from it.
String mySchema = "{\"namespace\": \"io.confluent.examples.streams.avro\"," +
                      "\"type\":\"record\"," +
                      "\"name\":\"DocumentUpload\"," +
                      "\"fields\":[{\"name\":\"orgId\",\"type\":\"string\"}," +
                                  "{\"name\":\"date\",\"type\":\"long\",\"logicalType\":\"timestamp-millis\"}]}";

Schema.Parser parser = new Schema.Parser();
Schema schema = parser.parse(mySchema);

// Just using three fictional organisations with the following orgIds/keys
String[] ORG_ARRAY = {"002", "003", "004"};

long count = 0;
String key = ""; // key is the realm
while(true) {
  count++;
  try {
    TimeUnit.SECONDS.sleep(5);
  } catch (InterruptedException e) {

  }
  GenericRecord avroRecord = new GenericData.Record(schema);
  int orgId = ThreadLocalRandom.current().nextInt(0, 2 + 1);

  avroRecord.put("orgId",ORG_ARRAY[orgId]);
  avroRecord.put("date",new Date().getTime());
  key = ORG_ARRAY[orgId];

  ProducerRecord<Object, Object> record = new ProducerRecord<>("topic_uploads", key, avroRecord);
  try {
    producer.send(record);
    producer.flush();
  } catch(SerializationException e) {
    System.out.println("Exccccception was generated! + " + e.getMessage());
  } catch(Exception el) {
    System.out.println("Exception: " + el.getMessage());
  }
}
  }
}

So, this generates a new event representing an upload for an organisation represented by the orgId but also specifically set in the key variable used in the ProducerRecord. 因此，这将生成一个新事件，该事件代表由orgId代表的组织的上载，并且还专门在ProducerRecord中使用的键变量中进行设置。

Here is the code that sets up the KStream for these events: 以下是为这些事件设置KStream的代码：

final SpecificAvroSerde<DocumentUpload> uploadSerde = new SpecificAvroSerde<>();
uploadSerde.configure(serdeConfig, false);

// Get the stream of uploads
final KStream<String, DocumentUpload> uploadStream = builder.stream(UPLOADS_TOPIC, Consumed.with(Serdes.String(), uploadSerde));

// Debug output to see the contents of the stream
uploadStream.foreach((k, v) -> System.out.println("uploadStream: Key: " + k + ", Value: " + v));

// Note, I tried to re-key the stream with the orgId field (even though it was set as the key in the driver but same problem)
final KStream<String, DocumentUpload> keyedUploadStream = uploadStream.selectKey((key, value) -> value.getOrgId());
keyedUploadStream.foreach((k, v) -> System.out.println("keyedUploadStream: Key: " + k + ", Value: " + v));

// Java 7 form used as it was easier to put in debug statements
// OrgPK is just a helper class defined in the same class 
KStream<String, OrgPk> joined = keyedUploadStream.leftJoin(orgs,
        new KeyValueMapper<String, DocumentUpload, String>() { /* derive a (potentially) new key by which to lookup against the table */
          @Override
          public String apply(String key, DocumentUpload value) {
            System.out.println("1. The key passed in is: " + key);
            System.out.println("2. The upload realm passed in is: " + value.getOrgId());
            return value.getOrgId();
          }
        },
        // THIS IS NEVER CALLED WITH A join() AND WHEN CALLED WITH A leftJoin() HAS A NULL ORGANISATION
        new ValueJoiner<DocumentUpload, Organisation, OrgPk>() {
          @Override
          public OrgPk apply(DocumentUpload leftValue, Organisation rightValue) {
            System.out.println("3. Value joiner has been called...");
            if( null == rightValue ) {
              // THIS IS ALWAYS CALLED, SO THERE IS NEVER A "MATCH"
              System.out.println("    3.1. Orgnisation is NULL");
              return new OrgPk(leftValue.getRealm(), 1L);
            }
            System.out.println("    3.1. Org is OK");
            // Never reaches here - this is the issue i.e. there is never a match
            return new OrgPk(leftValue.getOrgId(), rightValue.getPk());
          }
        });

So, the above join (or leftJoin) never matches, even though the two keys are the same! 因此，即使两个键相同，上述联接（或leftJoin）也永远不会匹配！ This is the main issue. 这是主要问题。

Finally, the avro schema for the DocumentUpload is: 最后，DocumentUpload的avro模式为：

{"namespace": "io.confluent.examples.streams.avro",
 "type":"record",
 "name":"DocumentUpload",
 "fields":[
    {"name": "orgId",   "type":"string"},
    {"name":"date",     "type":"long",  "logicalType":"timestamp-millis"}
  ]
}

So, in summary: 因此，总而言之：

I have a KStream on a topic with a String key of OrgId 我有一个主题为KStream的OrgId字符串键
I have a GlobalKTable on a topic with a String key of OrgId also. 我在主题上也有一个GlobalKTable，并且还带有OrgId的String键。
The join never works, even though the keys are in the GlobalKTable (at least they are in the topic underlying the GlobalKTable) 即使键在GlobalKTable中（至少它们在GlobalKTable的基础主题中），联接也永远无法工作

Can someone help me? 有人能帮我吗？ I am pulling my hair out trying to figure this out. 我正在拔头发试图解决这个问题。

Answer 1

通过提供状态目录配置StreamsConfig.STATE_DIR_CONFIG，我能够在Windows / Intellij上解决此问题。

Kafka KStream到GlobalKTable联接不适用于使用的相同密钥

问题描述

1 个解决方案

解决方案1
0 2018-08-22 19:27:25

Kafka KStream到GlobalKTable联接不适用于使用的相同密钥

问题描述

1 个解决方案

解决方案1 0 2018-08-22 19:27:25

解决方案1
0 2018-08-22 19:27:25