简体   繁体   中英

KSQL: How to cast JSON string to raw JSON

I need to copy messages from one Kafka topic to another based on a specific JSON property. That is, if property value is "A" - copy the message, otherwise do not copy. I'm trying to figure out the simplest way to do it with KSQL. My source messages all have my test property, but otherwise have very different and complex schema. Is there a way to have "schemaless" setup for this?

Source message (example):

{
    "data": {
        "propertyToCheck": "value",
        ... complex structure ...
    }
}

If I define my "data" as VARCHAR in the stream I can examine the property further on with EXTRACTJSONFIELD.

CREATE OR REPLACE STREAM Test1 (
    `data` VARCHAR
)
WITH (
    kafka_topic = 'Source_Topic',
    value_format = 'JSON'
);

In this case however, my "select" stream will produce data as JSON string instead of raw JSON (which is what I want).

CREATE OR REPLACE STREAM Test2 WITH (
    kafka_topic = 'Target_Topic',
    value_format = 'JSON'
)AS 
SELECT
  `data` AS `data`
FROM Test1
EMIT CHANGES;

Any ideas how to make this work?

This is a bit of a workaround, but you can achieve your desired behavior as follows: instead of defining your message schema as VARCHAR, use the BYTES type instead. Then use FROM_BYTES in combination with EXTRACTJSONFIELD to read the property you'd like to filter on from the bytes representation.

Here's an example:

Here's a source stream, with nested JSON data, and one example row of data:

CREATE STREAM test (data STRUCT<FOO VARCHAR, BAR VARCHAR>) with (kafka_topic='test', value_format='json', partitions=1);
INSERT INTO test (data) VALUES (STRUCT(FOO := 'foo', BAR := 'bar'));

Now, represent the data as bytes (using the KAFKA format), instead of as JSON:

CREATE STREAM test_bytes (data BYTES) WITH (kafka_topic='test', value_format='kafka');

Next, perform the filter based on the nested JSON data:

CREATE STREAM test_filtered_bytes WITH (kafka_topic='test_filtered') AS SELECT * FROM test_bytes WHERE extractjsonfield(from_bytes(data, 'utf8'), '$.DATA.FOO') = 'foo';

The newly created topic "test_filtered" now has data in proper JSON format, analogous to the source stream "test". We can verify by representing the stream in the original format and reading it back to check:

CREATE STREAM test_filtered (data STRUCT<FOO VARCHAR, BAR VARCHAR>) WITH (kafka_topic='test_filtered', value_format='json');
SELECT * FROM test_filtered EMIT CHANGES;

I verified that these example statements work for me as of the latest ksqlDB version (0.27.2). They should work the same on all ksqlDB versions ever since the BYTES type and relevant built-in functions were introduced.

Using ksqlDB scalar functions such as EXTRACTJSONFIELD or JSON_RECORDS might help you.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM