简体   繁体   中英

Confluent 4.1.0 ->KSQL : STREAM-TABLE join -> table data null

STEP 1: Run the producer to create sample data

./bin/kafka-avro-console-producer \
         --broker-list localhost:9092 --topic stream-test-topic \
         --property schema.registry.url=http://localhost:8081 \
         --property value.schema='{"type":"record","name":"dealRecord","fields":[{"name":"DEAL_ID","type":"string"},{"name":"DEAL_EXPENSE_CODE","type":"string"},{"name":"DEAL_BRANCH","type":"string"}]}'

Sample Data :

{"DEAL_ID":"deal002", "DEAL_EXPENSE_CODE":"EXP002", "DEAL_BRANCH":"AMSTERDAM"}
{"DEAL_ID":"deal003", "DEAL_EXPENSE_CODE":"EXP003", "DEAL_BRANCH":"AMSTERDAM"}
{"DEAL_ID":"deal004", "DEAL_EXPENSE_CODE":"EXP004", "DEAL_BRANCH":"AMSTERDAM"}
{"DEAL_ID":"deal005", "DEAL_EXPENSE_CODE":"EXP005", "DEAL_BRANCH":"AMSTERDAM"}
{"DEAL_ID":"deal006", "DEAL_EXPENSE_CODE":"EXP006", "DEAL_BRANCH":"AMSTERDAM"}
{"DEAL_ID":"deal007", "DEAL_EXPENSE_CODE":"EXP001", "DEAL_BRANCH":"AMSTERDAM"}
{"DEAL_ID":"deal008", "DEAL_EXPENSE_CODE":"EXP002", "DEAL_BRANCH":"AMSTERDAM"}
{"DEAL_ID":"deal009", "DEAL_EXPENSE_CODE":"EXP003", "DEAL_BRANCH":"AMSTERDAM"}
{"DEAL_ID":"deal010", "DEAL_EXPENSE_CODE":"EXP004", "DEAL_BRANCH":"AMSTERDAM"}
{"DEAL_ID":"deal011", "DEAL_EXPENSE_CODE":"EXP005", "DEAL_BRANCH":"AMSTERDAM"}
{"DEAL_ID":"deal012", "DEAL_EXPENSE_CODE":"EXP006", "DEAL_BRANCH":"AMSTERDAM"}

STEP 2: Open another terminal and run the consumer to test the data.

./bin/kafka-avro-console-consumer --topic stream-test-topic \
         --bootstrap-server localhost:9092 \
         --property schema.registry.url=http://localhost:8081 \
         --from-beginning

STEP 3: Open another terminal and run the producer.

./bin/kafka-avro-console-producer \
         --broker-list localhost:9092 --topic expense-test-topic \
--property "parse.key=true" \
--property "key.separator=:" \
--property schema.registry.url=http://localhost:8081 \
--property key.schema='"string"' \
         --property value.schema='{"type":"record","name":"dealRecord","fields":[{"name":"EXPENSE_CODE","type":"string"},{"name":"EXPENSE_DESC","type":"string"}]}'

Data:

"pk1":{"EXPENSE_CODE":"EXP001", "EXPENSE_DESC":"Regulatory Deposit"}
"pk2":{"EXPENSE_CODE":"EXP002", "EXPENSE_DESC":"ABC - Sofia"}
"pk3":{"EXPENSE_CODE":"EXP003", "EXPENSE_DESC":"Apple Corporation"}
"pk4":{"EXPENSE_CODE":"EXP004", "EXPENSE_DESC":"Confluent Europe"}
"pk5":{"EXPENSE_CODE":"EXP005", "EXPENSE_DESC":"Air India"}
"pk6":{"EXPENSE_CODE":"EXP006", "EXPENSE_DESC":"KLM International"}

STEP 4: Open another terminal and run the consumer

./bin/kafka-avro-console-consumer --topic expense-test-topic \
         --bootstrap-server localhost:9092 \
--property "parse.key=true" \
--property "key.separator=:" \
--property schema.registry.url=http://localhost:8081 \
         --from-beginning

STEP 5: Login to KSQL client.

./bin/ksql http://localhost:8088

create following stream and table and run join query.

KSQL:

STREAM:

    CREATE STREAM SAMPLE_STREAM 
       (DEAL_ID VARCHAR, DEAL_EXPENSE_CODE varchar, DEAL_BRANCH VARCHAR) 
       WITH (kafka_topic='stream-test-topic',value_format='AVRO', key = 'DEAL_ID');

TABLE:

CREATE TABLE SAMPLE_TABLE 
   (EXPENSE_CODE varchar, EXPENSE_DESC VARCHAR)
   WITH (kafka_topic='expense-test-topic',value_format='AVRO', key = 'EXPENSE_CODE');

FOLLOWING is the OUTPUT:

ksql> SELECT STREAM1.DEAL_EXPENSE_CODE, TABLE1.EXPENSE_DESC 
       from SAMPLE_STREAM STREAM1 LEFT JOIN SAMPLE_TABLE TABLE1 
       ON STREAM1.DEAL_EXPENSE_CODE = TABLE1.EXPENSE_CODE  
       WINDOW TUMBLING (SIZE 3 MINUTE) 
       GROUP BY STREAM1.DEAL_EXPENSE_CODE, TABLE1.EXPENSE_DESC;

EXP001 | null
EXP001 | null
EXP002 | null
EXP003 | null
EXP004 | null
EXP005 | null
EXP006 | null
EXP002 | null
EXP002 | null

tl;dr: Your table data needs to be keyed on the column on which you're joining.

Using the sample data above, here's how to investigate and fix.

  1. Use KSQL to check the data in the topics (no need for kafka-avro-console-consumer ). Format of the output data is timestamp, key, value

    • stream :

       ksql> print 'stream-test-topic' from beginning; Format:AVRO 30/04/18 15:59:13 BST, null, {"DEAL_ID": "deal002", "DEAL_EXPENSE_CODE": "EXP002", "DEAL_BRANCH": "AMSTERDAM"} 30/04/18 15:59:13 BST, null, {"DEAL_ID": "deal003", "DEAL_EXPENSE_CODE": "EXP003", "DEAL_BRANCH": "AMSTERDAM"} 30/04/18 15:59:13 BST, null, {"DEAL_ID": "deal004", "DEAL_EXPENSE_CODE": "EXP004", "DEAL_BRANCH": "AMSTERDAM"}
    • table :

       ksql> print 'expense-test-topic' from beginning; Format:AVRO 30/04/18 16:10:52 BST, pk1, {"EXPENSE_CODE": "EXP001", "EXPENSE_DESC": "Regulatory Deposit"} 30/04/18 16:10:52 BST, pk2, {"EXPENSE_CODE": "EXP002", "EXPENSE_DESC": "ABC - Sofia"} 30/04/18 16:10:52 BST, pk3, {"EXPENSE_CODE": "EXP003", "EXPENSE_DESC": "Apple Corporation"} 30/04/18 16:10:52 BST, pk4, {"EXPENSE_CODE": "EXP004", "EXPENSE_DESC": "Confluent Europe"} 30/04/18 16:10:52 BST, pk5, {"EXPENSE_CODE": "EXP005", "EXPENSE_DESC": "Air India"} 30/04/18 16:10:52 BST, pk6, {"EXPENSE_CODE": "EXP006", "EXPENSE_DESC": "KLM International"}

    At this point, note that the key ( pk<x> ) does not match the column on which we will be joining

  2. Register the two topics:

     ksql> CREATE STREAM deals WITH (KAFKA_TOPIC='stream-test-topic', VALUE_FORMAT='AVRO'); Message ---------------- Stream created ---------------- ksql> CREATE TABLE expense_codes_table WITH (KAFKA_TOPIC='expense-test-topic', VALUE_FORMAT='AVRO', KEY='EXPENSE_CODE'); Message --------------- Table created ---------------
  3. Tell KSQL to query events from the beginning of each topic

    ksql> SET 'auto.offset.reset' = 'earliest'; Successfully changed local property 'auto.offset.reset' from 'null' to 'earliest'
  4. Validate that the table's declared key per the DDL ( KEY='EXPENSE_CODE' ) matches the actual key of the underlying Kafka messages (available through the ROWKEY system column):

     ksql> SELECT ROWKEY, EXPENSE_CODE FROM expense_codes_table; pk1 | EXP001 pk2 | EXP002 pk3 | EXP003 pk4 | EXP004 pk5 | EXP005 pk6 | EXP006

    The keys don't match. Our join is doomed!

  5. Magic workaround—let's rekey the topic using KSQL!

    • Register the table's source topic as a KSQL STREAM :

       ksql> CREATE STREAM expense_codes_stream WITH (KAFKA_TOPIC='expense-test-topic', VALUE_FORMAT='AVRO'); Message ---------------- Stream created ----------------
    • Create a derived stream, keyed on the correct colum. This is underpinned by a re-keyed Kafka topic.

       ksql> CREATE STREAM EXPENSE_CODES_REKEY AS SELECT * FROM expense_codes_stream PARTITION BY EXPENSE_CODE; Message ---------------------------- Stream created and running ----------------------------
    • Re-register the KSQL _TABLE_ on top of the re-keyed topic:

       ksql> DROP TABLE expense_codes_table; Message ---------------------------------------- Source EXPENSE_CODES_TABLE was dropped ---------------------------------------- ksql> CREATE TABLE expense_codes_table WITH (KAFKA_TOPIC='EXPENSE_CODES_REKEY', VALUE_FORMAT='AVRO', KEY='EXPENSE_CODE'); Message --------------- Table created ---------------
    • Check the keys (declared vs message) match on the new table:

       ksql> SELECT ROWKEY, EXPENSE_CODE FROM expense_codes_table; EXP005 | EXP005 EXP001 | EXP001 EXP002 | EXP002 EXP003 | EXP003 EXP006 | EXP006 EXP004 | EXP004
  6. Successful join:

     ksql> SELECT D.DEAL_EXPENSE_CODE, E.EXPENSE_DESC \\ FROM deals D \\ LEFT JOIN expense_codes_table E \\ ON D.DEAL_EXPENSE_CODE = E.EXPENSE_CODE \\ WINDOW TUMBLING (SIZE 3 MINUTE) \\ GROUP BY D.DEAL_EXPENSE_CODE, E.EXPENSE_DESC; EXP006 | KLM International EXP003 | Apple Corporation EXP002 | ABC - Sofia EXP004 | Confluent Europe EXP001 | Regulatory Deposit EXP005 | Air India

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM