使用 ClickHouse 使用来自 Kafka 的嵌套 JSON 消息

Question

Clickhouse can definitely read JSON messages from Kafka if they are flat JSON documents.如果它们是平面 JSON 文档，Clickhouse 绝对可以读取来自 Kafka 的 JSON 消息。

We indicate this with kafka_format = 'JSONEachRow' in Clickhouse.我们在 Clickhouse 中使用kafka_format = 'JSONEachRow'来表示这一点。

This is the way we currently using it:这是我们目前使用它的方式：

CREATE TABLE topic1_kafka
(
    ts Int64,
    event String,
    title String,
    msg String
) ENGINE = Kafka
SETTINGS kafka_broker_list = 'kafka1test.intra:9092,kafka2test.intra:9092,kafka3test.intra:9092',
kafka_topic_list = 'topic1', kafka_num_consumers = 1, kafka_group_name = 'ch1', 
kafka_format = 'JSONEachRow'

This is fine as long as producers send flat JSON to topic1_kafka .只要生产者将平面 JSON 发送到topic1_kafka就可以了。 But not all producers send flat JSON, most of the applications generate nested JSON documents like this:但并非所有生产者都发送平面 JSON，大多数应用程序会生成嵌套的 JSON 文档，如下所示：

{
  "ts": 1598033988,
  "deviceId": "cf060111-dbe6-4aa8-a2d0-d5aa17f45663",
  "location": [39.920515, 32.853708],
  "stats": {
    "temp": 71.2,
    "total_memory": 32,
    "used_memory": 21.2
  }
}

Unfortunately the JSON document above is not compatible with JSONEachRow , therefore ClickHouse cannot map fields in the JSON document to columns in the table.不幸的是，上面的 JSON 文档与JSONEachRow不兼容，因此 ClickHouse 无法将 JSON 文档中的 map 字段映射到表中的列。

Is there any way to do this mapping?有没有办法做这个映射？

EDIT : We want to map the nested json to a flat table like this:编辑：我们希望将嵌套的 json map 放到这样的平面表中：

CREATE TABLE topic1
(
    ts Int64,
    deviceId String,
    location_1 Float64,
    location_2 Float64,
    stats_temp Float64,
    stats_total_memory Float64,
    stats_used_memory Float64
) ENGINE = MergeTree()

Answer 1

It looks like the once way is getting 'raw' data as String and then process each row using JSON functions in Consumer Materialized View.看起来曾经的方法是将“原始”数据作为字符串获取，然后使用消费者物化视图中的JSON 函数处理每一行。

WITH '{"ts": 1598033988, "deviceId": "cf060111-dbe6-4aa8-a2d0-d5aa17f45663", "location": [39.920515, 32.853708], "stats": { "temp": 71.2, "total_memory": 32, "used_memory": 21.2 }}' AS raw
SELECT 
  JSONExtractUInt(raw, 'ts') AS ts,
  JSONExtractString(raw, 'deviceId') AS deviceId,
  arrayMap(x -> toFloat32(x), JSONExtractArrayRaw(raw, 'location')) AS location,
  JSONExtract(raw, 'stats', 'Tuple(temp Float64, total_memory Float64, used_memory Float64)') AS stats,
  stats.1 AS temp,
  stats.2 AS total_memory,
  stats.3 AS used_memory;

/*
┌─────────ts─┬─deviceId─────────────────────────────┬─location──────────────┬─stats────────────────────────┬─temp─┬─total_memory─┬────────used_memory─┐
│ 1598033988 │ cf060111-dbe6-4aa8-a2d0-d5aa17f45663 │ [39.920513,32.853706] │ (71.2,32,21.200000000000003) │ 71.2 │           32 │ 21.200000000000003 │
└────────────┴──────────────────────────────────────┴───────────────────────┴──────────────────────────────┴──────┴──────────────┴────────────────────┘
*/

Remark: for numbers with floating point should be used type Float64 not Float32 (see related CH Issue 13962 ).备注：对于浮点数，应使用类型Float64而不是Float32 （参见相关的CH Issue 13962 ）。

Using the standard data types required changing the schema of JSON:使用标准数据类型需要更改 JSON 的架构：

represent stats asTuple将统计数据表示为元组

CREATE TABLE test_tuple_field
(
    ts Int64,
    deviceId String,
    location Array(Float32),
    stats Tuple(Float32, Float32, Float32)
) ENGINE = MergeTree()
ORDER BY ts;


INSERT INTO test_tuple_field FORMAT JSONEachRow 
{ "ts": 1598033988, "deviceId": "cf060111-dbe6-4aa8-a2d0-d5aa17f45663", "location": [39.920515, 32.853708], "stats": [71.2, 32, 21.2]};

represent stats as Nested Structure将统计数据表示为嵌套结构

CREATE TABLE test_nested_field
(
    ts Int64,
    deviceId String,
    location Array(Float32),
    stats Nested (temp Float32, total_memory Float32, used_memory Float32)
) ENGINE = MergeTree()
ORDER BY ts;


SET input_format_import_nested_json=1;
INSERT INTO test_nested_field FORMAT JSONEachRow 
{ "ts": 1598033988, "deviceId": "cf060111-dbe6-4aa8-a2d0-d5aa17f45663", "location": [39.920515, 32.853708], "stats": { "temp": [71.2], "total_memory": [32], "used_memory": [21.2] }};

See the related answer ClickHouse JSON parse exception: Cannot parse input: expected ',' before .查看相关答案ClickHouse JSON parse exception: Cannot parse input: expected ',' before 。

Answer 2

I just want to point out one issue with the comments above: Nested type is not for OP's json structure as it will require array in each sub node.我只想指出上述评论的一个问题：嵌套类型不适用于 OP 的 json 结构，因为它需要每个子节点中的数组。

使用 ClickHouse 使用来自 Kafka 的嵌套 JSON 消息

问题描述

1 个解决方案

解决方案1
5 2020-08-22 01:44:00

解决方案2
0 2022-07-01 02:41:26

使用 ClickHouse 使用来自 Kafka 的嵌套 JSON 消息

问题描述

1 个解决方案

解决方案1 5 2020-08-22 01:44:00

解决方案2 0 2022-07-01 02:41:26

解决方案1
5 2020-08-22 01:44:00

解决方案2
0 2022-07-01 02:41:26