简体   繁体   English

Kafka KStream到GlobalKTable联接不适用于使用的相同密钥

[英]Kafka KStream to GlobalKTable join does not work with same key used

I have a very frustrating problem trying to join a KStream, that is populated by a java driver program using KafkaProducer, to a GlobalKTable that is populated from a Topic that, in turn, is populated using the JDBCConnector pulling data from a MySQL Table. 我有一个非常令人沮丧的问题,尝试将由使用KafkaProducer的Java驱动程序填充的KStream合并到从Topic填充的GlobalKTable中,该主题又由使用JDBCConnector从MySQL Table提取数据的填充。 No matter what I try to do the join between the KStream and the GlobalKTable, which both are keyed on the same value, will not work. 无论我做什么尝试,都必须将KStream和GlobalKTable的键值都设置为相同的值,否则将无法正常工作。 What I mean is that the ValueJoiner is never called. 我的意思是从未调用过ValueJoiner。 I'll try and explain by showing the relevant config and code below. 我将在下面显示相关的配置和代码来尝试解释。 I appreciate any help. 感谢您的帮助。

I am using the latest version of the confluent platform. 我正在使用最新版本的融合平台。

The topic that the GlobalKTable is populated from is pulled from a single MySQL table: 填充GlobalKTable的主题来自单个MySQL表:

Column Name/Type:
pk/bigint(20)
org_name/varchar(255)
orgId/varchar(10)

The JDBCConnector configuration for this is: 为此,JDBCConnector配置为:

name=my-demo
connector.class=io.confluent.connect.jdbc.JdbcSourceConnector
key.converter=io.confluent.connect.avro.AvroConverter
key.converter.schema.registry.url=http://localhost:8081
value.converter=io.confluent.connect.avro.AvroConverter
value.converter.schema.registry.url=http://localhost:8081
connection.url=jdbc:mysql://localhost:3306/reporting?user=root&password=XXX
table.whitelist=organisation
mode=incrementing
incrementing.column.name=pk
topic.prefix=my-
transforms=keyaddition
transforms.keyaddition.type=org.apache.kafka.connect.transforms.ValueToKey
transforms.keyaddition.fields=orgId

I am running the JDBC connector using the command line: 我正在使用命令行运行JDBC连接器:

connect-standalone /home/jim/platform/confluent/etc/schema-registry/connect-avro-standalone.properties /home/jim/prg/kafka/config/my.mysql.properties

This gives me a topic called my-organisation, that is keyed on orgId ..... so far so good! 这给了我一个名为my-organisation的主题,到目前为止,它的主题是orgId .....! (note, the namespace does not seem to be set by JDBCConnector but I don't think this is an issue but I don't know for sure) (请注意,名称空间似乎不是由JDBCConnector设置的,但我认为这不是问题,但我不确定)

Now, the code. 现在,代码。 Here is how I initialise and create the GlobalKTable (relevant code shown): 这是初始化和创建GlobalKTable的方法(显示了相关代码):

final Map<String, String> serdeConfig =
    Collections.singletonMap(AbstractKafkaAvroSerDeConfig.SCHEMA_REGISTRY_URL_CONFIG,
        schemaRegistryUrl);

final StreamsBuilder builder = new StreamsBuilder();

final SpecificAvroSerde<Organisation> orgSerde = new SpecificAvroSerde<>();
orgSerde.configure(serdeConfig, false);

// Create the GlobalKTable from the topic that was populated using the connect-standalone command line 
final GlobalKTable<String, Organisation>
   orgs =
   builder.globalTable(ORG_TOPIC, Materialized.<String, Organisation, KeyValueStore<Bytes, byte[]>>as(ORG_STORE)
           .withKeySerde(Serdes.String())
           .withValueSerde(orgSerde));

The avro schema, from where the Organisaton class is generated is defined as: 从中生成Organisaton类的avro模式定义为:

{"namespace": "io.confluent.examples.streams.avro",
 "type":"record",
 "name":"Organisation",
 "fields":[
    {"name": "pk",      "type":"long"},
    {"name": "org_name",   "type":"string"},
    {"name": "orgId",   "type":"string"}
  ]
}

Note: as described above the orgId is set as the key on the topic using the single message transform (SMT) operation. 注意:如上所述,使用单消息转换(SMT)操作将orgId设置为主题的键。

So, that is the GlobalKTable setup. 因此,这就是GlobalKTable设置。

Now for the KStream setup (the right hand side of the join). 现在进行KStream设置(连接的右侧)。 This has the same key (orgId) as the globalKTable. 它具有与globalKTable相同的键(orgId)。 I use a simple driver program for this: 为此,我使用了一个简单的驱动程序:

(The use case is that this topic will contain events associated with each organisation) (用例是该主题将包含与每个组织相关的事件)

public class UploadGenerator {

  public static void main(String[] args){
    Properties props = new Properties();
    props.put(ProducerConfig.BOOTSTRAP_SERVERS_CONFIG, "localhost:9092");
    props.put(ProducerConfig.KEY_SERIALIZER_CLASS_CONFIG,
          io.confluent.kafka.serializers.KafkaAvroSerializer.class);
    props.put(ProducerConfig.VALUE_SERIALIZER_CLASS_CONFIG,
          io.confluent.kafka.serializers.KafkaAvroSerializer.class);
    props.put("schema.registry.url", "http://localhost:8081");
KafkaProducer producer = new KafkaProducer(props);

// This schema is also used in the consumer application or more specifically a class generated from it.
String mySchema = "{\"namespace\": \"io.confluent.examples.streams.avro\"," +
                      "\"type\":\"record\"," +
                      "\"name\":\"DocumentUpload\"," +
                      "\"fields\":[{\"name\":\"orgId\",\"type\":\"string\"}," +
                                  "{\"name\":\"date\",\"type\":\"long\",\"logicalType\":\"timestamp-millis\"}]}";

Schema.Parser parser = new Schema.Parser();
Schema schema = parser.parse(mySchema);

// Just using three fictional organisations with the following orgIds/keys
String[] ORG_ARRAY = {"002", "003", "004"};

long count = 0;
String key = ""; // key is the realm
while(true) {
  count++;
  try {
    TimeUnit.SECONDS.sleep(5);
  } catch (InterruptedException e) {

  }
  GenericRecord avroRecord = new GenericData.Record(schema);
  int orgId = ThreadLocalRandom.current().nextInt(0, 2 + 1);

  avroRecord.put("orgId",ORG_ARRAY[orgId]);
  avroRecord.put("date",new Date().getTime());
  key = ORG_ARRAY[orgId];

  ProducerRecord<Object, Object> record = new ProducerRecord<>("topic_uploads", key, avroRecord);
  try {
    producer.send(record);
    producer.flush();
  } catch(SerializationException e) {
    System.out.println("Exccccception was generated! + " + e.getMessage());
  } catch(Exception el) {
    System.out.println("Exception: " + el.getMessage());
  }
}
  }
}

So, this generates a new event representing an upload for an organisation represented by the orgId but also specifically set in the key variable used in the ProducerRecord. 因此,这将生成一个新事件,该事件代表由orgId代表的组织的上载,并且还专门在ProducerRecord中使用的键变量中进行设置。

Here is the code that sets up the KStream for these events: 以下是为这些事件设置KStream的代码:

final SpecificAvroSerde<DocumentUpload> uploadSerde = new SpecificAvroSerde<>();
uploadSerde.configure(serdeConfig, false);

// Get the stream of uploads
final KStream<String, DocumentUpload> uploadStream = builder.stream(UPLOADS_TOPIC, Consumed.with(Serdes.String(), uploadSerde));

// Debug output to see the contents of the stream
uploadStream.foreach((k, v) -> System.out.println("uploadStream: Key: " + k + ", Value: " + v));

// Note, I tried to re-key the stream with the orgId field (even though it was set as the key in the driver but same problem)
final KStream<String, DocumentUpload> keyedUploadStream = uploadStream.selectKey((key, value) -> value.getOrgId());
keyedUploadStream.foreach((k, v) -> System.out.println("keyedUploadStream: Key: " + k + ", Value: " + v));

// Java 7 form used as it was easier to put in debug statements
// OrgPK is just a helper class defined in the same class 
KStream<String, OrgPk> joined = keyedUploadStream.leftJoin(orgs,
        new KeyValueMapper<String, DocumentUpload, String>() { /* derive a (potentially) new key by which to lookup against the table */
          @Override
          public String apply(String key, DocumentUpload value) {
            System.out.println("1. The key passed in is: " + key);
            System.out.println("2. The upload realm passed in is: " + value.getOrgId());
            return value.getOrgId();
          }
        },
        // THIS IS NEVER CALLED WITH A join() AND WHEN CALLED WITH A leftJoin() HAS A NULL ORGANISATION
        new ValueJoiner<DocumentUpload, Organisation, OrgPk>() {
          @Override
          public OrgPk apply(DocumentUpload leftValue, Organisation rightValue) {
            System.out.println("3. Value joiner has been called...");
            if( null == rightValue ) {
              // THIS IS ALWAYS CALLED, SO THERE IS NEVER A "MATCH"
              System.out.println("    3.1. Orgnisation is NULL");
              return new OrgPk(leftValue.getRealm(), 1L);
            }
            System.out.println("    3.1. Org is OK");
            // Never reaches here - this is the issue i.e. there is never a match
            return new OrgPk(leftValue.getOrgId(), rightValue.getPk());
          }
        });

So, the above join (or leftJoin) never matches, even though the two keys are the same! 因此,即使两个键相同,上述联接(或leftJoin)也永远不会匹配! This is the main issue. 这是主要问题。

Finally, the avro schema for the DocumentUpload is: 最后,DocumentUpload的avro模式为:

{"namespace": "io.confluent.examples.streams.avro",
 "type":"record",
 "name":"DocumentUpload",
 "fields":[
    {"name": "orgId",   "type":"string"},
    {"name":"date",     "type":"long",  "logicalType":"timestamp-millis"}
  ]
}

So, in summary: 因此,总而言之:

  1. I have a KStream on a topic with a String key of OrgId 我有一个主题为KStream的OrgId字符串键
  2. I have a GlobalKTable on a topic with a String key of OrgId also. 我在主题上也有一个GlobalKTable,并且还带有OrgId的String键。
  3. The join never works, even though the keys are in the GlobalKTable (at least they are in the topic underlying the GlobalKTable) 即使键在GlobalKTable中(至少它们在GlobalKTable的基础主题中),联接也永远无法工作

Can someone help me? 有人能帮我吗? I am pulling my hair out trying to figure this out. 我正在拔头发试图解决这个问题。

通过提供状态目录配置StreamsConfig.STATE_DIR_CONFIG,我能够在Windows / Intellij上解决此问题。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM