简体   繁体   English

使用 ClickHouse 使用来自 Kafka 的嵌套 JSON 消息

[英]Consuming nested JSON message from Kafka with ClickHouse

Clickhouse can definitely read JSON messages from Kafka if they are flat JSON documents.如果它们是平面 JSON 文档,Clickhouse 绝对可以读取来自 Kafka 的 JSON 消息。

We indicate this with kafka_format = 'JSONEachRow' in Clickhouse.我们在 Clickhouse 中使用kafka_format = 'JSONEachRow'来表示这一点。

This is the way we currently using it:这是我们目前使用它的方式:

CREATE TABLE topic1_kafka
(
    ts Int64,
    event String,
    title String,
    msg String
) ENGINE = Kafka
SETTINGS kafka_broker_list = 'kafka1test.intra:9092,kafka2test.intra:9092,kafka3test.intra:9092',
kafka_topic_list = 'topic1', kafka_num_consumers = 1, kafka_group_name = 'ch1', 
kafka_format = 'JSONEachRow'

This is fine as long as producers send flat JSON to topic1_kafka .只要生产者将平面 JSON 发送到topic1_kafka就可以了。 But not all producers send flat JSON, most of the applications generate nested JSON documents like this:但并非所有生产者都发送平面 JSON,大多数应用程序会生成嵌套的 JSON 文档,如下所示:

{
  "ts": 1598033988,
  "deviceId": "cf060111-dbe6-4aa8-a2d0-d5aa17f45663",
  "location": [39.920515, 32.853708],
  "stats": {
    "temp": 71.2,
    "total_memory": 32,
    "used_memory": 21.2
  }
}

Unfortunately the JSON document above is not compatible with JSONEachRow , therefore ClickHouse cannot map fields in the JSON document to columns in the table.不幸的是,上面的 JSON 文档与JSONEachRow不兼容,因此 ClickHouse 无法将 JSON 文档中的 map 字段映射到表中的列。

Is there any way to do this mapping?有没有办法做这个映射?

EDIT : We want to map the nested json to a flat table like this:编辑:我们希望将嵌套的 json map 放到这样的平面表中:

CREATE TABLE topic1
(
    ts Int64,
    deviceId String,
    location_1 Float64,
    location_2 Float64,
    stats_temp Float64,
    stats_total_memory Float64,
    stats_used_memory Float64
) ENGINE = MergeTree()

It looks like the once way is getting 'raw' data as String and then process each row using JSON functions in Consumer Materialized View.看起来曾经的方法是将“原始”数据作为字符串获取,然后使用消费者物化视图中的JSON 函数处理每一行。

WITH '{"ts": 1598033988, "deviceId": "cf060111-dbe6-4aa8-a2d0-d5aa17f45663", "location": [39.920515, 32.853708], "stats": { "temp": 71.2, "total_memory": 32, "used_memory": 21.2 }}' AS raw
SELECT 
  JSONExtractUInt(raw, 'ts') AS ts,
  JSONExtractString(raw, 'deviceId') AS deviceId,
  arrayMap(x -> toFloat32(x), JSONExtractArrayRaw(raw, 'location')) AS location,
  JSONExtract(raw, 'stats', 'Tuple(temp Float64, total_memory Float64, used_memory Float64)') AS stats,
  stats.1 AS temp,
  stats.2 AS total_memory,
  stats.3 AS used_memory;

/*
┌─────────ts─┬─deviceId─────────────────────────────┬─location──────────────┬─stats────────────────────────┬─temp─┬─total_memory─┬────────used_memory─┐
│ 1598033988 │ cf060111-dbe6-4aa8-a2d0-d5aa17f45663 │ [39.920513,32.853706] │ (71.2,32,21.200000000000003) │ 71.2 │           32 │ 21.200000000000003 │
└────────────┴──────────────────────────────────────┴───────────────────────┴──────────────────────────────┴──────┴──────────────┴────────────────────┘
*/

Remark: for numbers with floating point should be used type Float64 not Float32 (see related CH Issue 13962 ).备注:对于浮点数,应使用类型Float64而不是Float32 (参见相关的CH Issue 13962 )。


Using the standard data types required changing the schema of JSON:使用标准数据类型需要更改 JSON 的架构:

  1. represent stats asTuple统计数据表示为元组
CREATE TABLE test_tuple_field
(
    ts Int64,
    deviceId String,
    location Array(Float32),
    stats Tuple(Float32, Float32, Float32)
) ENGINE = MergeTree()
ORDER BY ts;


INSERT INTO test_tuple_field FORMAT JSONEachRow 
{ "ts": 1598033988, "deviceId": "cf060111-dbe6-4aa8-a2d0-d5aa17f45663", "location": [39.920515, 32.853708], "stats": [71.2, 32, 21.2]};
  1. represent stats as Nested Structure统计数据表示为嵌套结构
CREATE TABLE test_nested_field
(
    ts Int64,
    deviceId String,
    location Array(Float32),
    stats Nested (temp Float32, total_memory Float32, used_memory Float32)
) ENGINE = MergeTree()
ORDER BY ts;


SET input_format_import_nested_json=1;
INSERT INTO test_nested_field FORMAT JSONEachRow 
{ "ts": 1598033988, "deviceId": "cf060111-dbe6-4aa8-a2d0-d5aa17f45663", "location": [39.920515, 32.853708], "stats": { "temp": [71.2], "total_memory": [32], "used_memory": [21.2] }};

See the related answer ClickHouse JSON parse exception: Cannot parse input: expected ',' before .查看相关答案ClickHouse JSON parse exception: Cannot parse input: expected ',' before

I just want to point out one issue with the comments above: Nested type is not for OP's json structure as it will require array in each sub node.我只想指出上述评论的一个问题:嵌套类型不适用于 OP 的 json 结构,因为它需要每个子节点中的数组。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 在 KAFKA 中生成和使用 JSON - Producing and consuming JSON in KAFKA Clickhouse/Kafka:将 JSON 对象类型读入字段 - Clickhouse/Kafka: reading a JSON Object type into a field 使用Kafka的JSON对象时序列化出错 - Error in serialization when consuming JSON Objects from Kafka 使用 kafka-json-schema-console-consumer 从 KAFKA 消费键/值仅返回值而不返回键 - Consuming a key/value from KAFKA using kafka-json-schema-console-consumer only returns value but not key 从 Kafka 消息读取 Json 路径 - Json Path Read from a Kafka Message 从 Kafka json 消息到雪花表 - From Kafka json message to Snowflake table 解析来自 Kafka 的嵌套 json 的模式 - Schemas with parsing nested json from Kafka ClickHouse Kafka Engine可以使用压缩的JSON消息吗? - Can ClickHouse Kafka Engine consume compressed JSON messages? 从 POST 请求中使用嵌套的 JSON,无法将嵌套的 JSON 分配给我的类 - Consuming a nested JSON from a POST request, can't assign the nested JSON to my class 消耗来自kafka主题的json值并将其写入并使用JQ将其格式化为csv文件 - Consuming json values from a kafka topic and writing them and formatting them in a csv file using JQ
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM