简体   繁体   English

ksql - 从 json 数组创建流

[英]ksql - creating a stream from a json array

My kafka topic is pushing data in this format (coming from collectd ):我的 kafka 主题正在以这种格式推送数据(来自collectd ):

[{"values":[100.000080140372],"dstypes":["derive"],"dsnames":["value"],"time":1529970061.145,"interval":10.000,"host":"k5.orch","plugin":"cpu","plugin_instance":"23","type":"cpu","type_instance":"idle","meta":{"network:received":true}}]

It's a combination of arrays, ints and floats... and the whole thing is inside a json array.它是数组、整数和浮点数的组合……整个过程都在一个 json 数组中。 As a result Im having a heck of a time using ksql to do anything with this data.因此,我有很多时间使用ksql来处理这些数据。

When I create a 'default' stream as当我创建一个“默认”流时

create stream cd_temp with (kafka_topic='ctd_test', value_format='json');

I get this result:我得到这个结果:

ksql> describe cd_temp;

 Field   | Type                      
-------------------------------------
 ROWTIME | BIGINT           (system) 
 ROWKEY  | VARCHAR(STRING)  (system) 
-------------------------------------

Any select will return the ROWTIME and an 8 digit hex value for ROWKEY.任何选择都将返回 ROWTIME 和 ROWKEY 的 8 位十六进制值。

I've spent some time trying to extract the json fields to no avail.我花了一些时间试图提取 json 字段无济于事。 What concerns me is this:让我担心的是:

ksql> print 'ctd_test' from beginning;
Format:JSON
com.fasterxml.jackson.databind.node.ArrayNode cannot be cast to com.fasterxml.jackson.databind.node.ObjectNode

Is it possible that this topic can't be used in ksql ?这个话题有可能不能在ksql 中使用吗? Is there a technique for unpacking the outer array to get to the interesting bits inside?是否有一种技术可以解包外部数组以获取内部有趣的位?

At the time of writing, (June 2018), KSQL can't handle a JSON message where the whole thing is embedded inside a top level array.在撰写本文时(2018 年 6 月),KSQL 无法处理整个内容都嵌入在顶级数组中的 JSON 消息。 There is a github issue to track this .有一个github 问题来跟踪这个 I'd suggest adding a +1 vote on this issue to up the priority of it.我建议在这个问题上添加 +1 投票以提高它的优先级。

Also, I notice that your create stream statement is not defining the schema of the json message.另外,我注意到您的 create stream 语句没有定义 json 消息的架构。 While this won't help in this situation, it is something that you'll need for other Json input formats, ie you create statement should be something like:虽然这在这种情况下无济于事,但它是其他 Json 输入格式所需要的,即您的 create 语句应该是这样的:

create stream cd_temp (values ARRAY<DOUBLE>, dstypes ARRAY<VARCHAR>, etc) with (kafka_topic='ctd_test', value_format='json');

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM