简体   繁体   English

如何从 kafka 中的主题创建具有大量 JSON 字段的 KSQL 流?

[英]How to create KSQL Stream with large number of JSON fields from topic in kafka?

I am passing a long JSON String to kafka topic eg:我将一个长 JSON 字符串传递给 kafka 主题,例如:

{
    "glossary": {
        "title": "example glossary",
        "GlossDiv": {
            "title": "S",
            "GlossList": {
                "GlossEntry": {
                    "ID": "SGML",
                    "SortAs": "SGML",
                    "GlossTerm": "Standard Generalized Markup Language",
                    "Acronym": "SGML",
                    "Abbrev": "ISO 8879:1986",
                    "GlossDef": {
                        "para": "A meta-markup language, used to create markup languages such as DocBook.",
                        "GlossSeeAlso": ["GML", "XML"]
                    },
                    "GlossSee": "markup"
                }
            }
        }
    }
}

and want to create stream from the kafka topic with all the fields with out specifing every field in KSQL for eg:并希望从 kafka 主题创建包含所有字段的流,而不指定 KSQL 中的每个字段,例如:

 CREATE STREAM pageviews_original (*) WITH \
(kafka_topic='pageviews', value_format='JSON');

If you want the field names picked up automatically by KSQL, you need to use Avro.如果您希望 KSQL 自动获取字段名称,则需要使用 Avro。 If you use Avro, the schema for the data is registered in the Confluent Schema Registry, and KSQL will retrieve it automatically when you use the topic.如果您使用 Avro,则数据的架构会在 Confluent Schema Registry 中注册,当您使用该主题时,KSQL 会自动检索它。

If you are using JSON, you have to tell KSQL what the columns are.如果您使用 JSON,您必须告诉 KSQL 列是什么。 You can do this either in the CREATE STREAM statement, using STRUCT data type for nested elements.您可以在CREATE STREAM语句中执行此操作,对嵌套元素使用STRUCT数据类型。

You can kind of workaround listing all fields by declaring only the high-level fields in the CREATE STREAM and then accessing nested elements with EXTRACTJSONFIELD for fields that you want to use.您可以通过仅声明CREATE STREAM中的高级字段,然后使用EXTRACTJSONFIELD访问您要使用的字段的嵌套元素来EXTRACTJSONFIELD字段。 Be aware than there is an issue in 5.0.0, which will be fixed in 5.0.1 .请注意 5.0.0 中存在一个问题,该问题将在 5.0.1 中修复 Also you can't use this for nested arrays etc which you do have in the sample data you show.此外,您不能将它用于您显示的示例数据中的嵌套数组等。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM