简体   繁体   English

使用SCALA解析嵌套的JSON

[英]Parsing Nested JSON using SCALA

I am looking to inject Telemetry data and the output is a multi layered nested JSON file. 我正在寻找注入遥测数据,并且输出是多层嵌套的JSON文件。 I am interested in very specific fields but I am not able to parse the JSON file to get to the data. 我对非常特定的字段感兴趣,但是我无法解析JSON文件来获取数据。

Data Sample: 数据样本:

{ "version_str": "1.0.0", "node_id_str": "router-01", "encoding_path": "sys/intf", "collection_id": 241466, "collection_start_time": 0, "collection_end_time": 0, "msg_timestamp": 0, "subscription_id": [ ], "sensor_group_id": [ ], "data_source": "DME", "data": { "interfaceEntity": { "attributes": { "childAction": "", "descr": "", "dn": "sys/intf", "modTs": "2017-09-19T13:24:14.751+00:00", "monPolDn": "uni/fabric/monfab-default", "persistentOnReload": "true", "status": "" }, "children": [ { "l3LbRtdIf": { "attributes": { "adminSt": "up", "childAction": "", "descr": "Nothing", "id": "lo103", "linkLog": "default", "modTs": "2017-11-06T23:18:02.974+00:00", "monPolDn": "uni/fabric/monfab-default", "name": "", "persistentOnReload": "true", "rn": "lb-[lo103]", "status": "", "uid": "0" }, "children": [ { "ethpmLbRtdIf": { "attributes": { "currErrIndex": "4294967295", "ifIndex": "335544423", "iod": "14", "lastErrors": "0,0,0,0", "operBitset": "", "operDescr": "Nothing", "operMtu": "1500", "operS {“ version_str”:“ 1.0.0”,“ node_id_str”:“ router-01”,“ encoding_path”:“ sys / intf”,“ collection_id”:241466,“ collection_start_time”:0,“ collection_end_time”:0,“ msg_timestamp“:0,” subscription_id“:[],” sensor_group_id“:[],” data_source“:” DME“,” data“:{” interfaceEntity“:{” attributes“:{” childAction“:”“,” descr“:”“,” dn“:” sys / intf“,” modTs“:” 2017-09-19T13:24:14.751 + 00:00“,” monPolDn“:” uni / fabric / monfab-default“, “ persistentOnReload”:“ true”,“ status”:“”},“ children”:[{“ l3LbRtdIf”:{“ attributes”:{“ adminSt”:“ up”,“ childAction”:“”,“ descr” :“无”,“ id”:“ lo103”,“ linkLog”:“默认”,“ modTs”:“ 2017-11-06T23:18:02.974 + 00:00”,“ monPolDn”:“ uni / fabric / monfab-default“,” name“:”“,” persistentOnReload“:” true“,” rn“:” lb- [lo103]“,” status“:”“,” uid“:” 0“},”儿童“:[{” ethpmLbRtdIf“:{”属性“:{” currErrIndex“:” 4294967295“,” ifIndex“:” 335544423“,” iod“:” 14“,” lastErrors“:” 0,0,0,0 “,” operBitset“:”“,” operDescr“:”无“,” operMtu“:” 1500“,” operS t": "up", "operStQual": "none", "rn": "lbrtdif" } } }, { "nwRtVrfMbr": { "attributes": { "childAction": "", "l3vmCfgFailedBmp": "", "l3vmCfgFailedTs": "00:00:00:00.000", "l3vmCfgState": "0", "modTs": "2017-11-06T23:18:02.945+00:00", "monPolDn": "", "parentSKey": "unspecified", "persistentOnReload": "true", "rn": "rtvrfMbr", "status": "", "tCl": "l3Inst", "tDn": "sys/inst-default", "tSKey": "" } } } ] } }, { "l3LbRtdIf": { "attributes": { "adminSt": "up", "childAction": "", "descr": "Nothing", "id": "lo104", "linkLog": "default", "modTs": "2018-01-25T15:54:20.367+00:00", "monPolDn": "uni/fabric/monfab-default", "name": "", "persistentOnReload": "true", "rn": "lb-[lo104]", "status": "", "uid": "0" }, "children": [ { "ethpmLbRtdIf": { "attributes": { "currErrIndex": "4294967295", "ifIndex": "335544424", "iod": "77", "lastErrors": "0,0,0,0", "operBitset": "", "operDescr": "Nothing", "operMtu": "1500", "operSt": "up", "operStQual": "none", "rn": "lbrtdif" } } }, { "nwRtVrfMbr": { "attribute t“:” up“,” operStQual“:” none“,” rn“:” lbrtdif“}}},{” nwRtVrfMbr“:{” attributes“:{” childAction“:”“,” l3vmCfgFailedBmp“:”“ ,“ l3vmCfgFailedTs”:“ 00:00:00:00.000”,“ l3vmCfgState”:“ 0”,“ modTs”:“ 2017-11-06T23:18:02.945 + 00:00”,“ monPolDn”:“”, “ parentSKey”:“未指定”,“ persistentOnReload”:“ true”,“ rn”:“ rtvrfMbr”,“ status”:“”,“ tCl”:“ l3Inst”,“ tDn”:“ sys / inst-default” ,“” tSKey“:”“}}}]}},{” l3LbRtdIf“:{”属性“:{” adminSt“:” up“,” childAction“:”“,” descr“:”无“,” id “:” lo104“,” linkLog“:”默认“,” modTs“:” 2018-01-25T15:54:20.367 + 00:00“,” monPolDn“:” uni / fabric / monfab-default“,”名称“:”,“ persistentOnReload”:“ true”,“ rn”:“ lb- [lo104]”,“ status”:“”,“ uid”:“ 0”},“ children”:[{“ ethpmLbRtdIf” :{“属性”:{“ currErrIndex”:“ 4294967295”,“ ifIndex”:“ 335544424”,“ iod”:“ 77”,“ lastErrors”:“ 0,0,0,0”,“ operBitset”:“ “,” operDescr“:”无“,” operMtu“:” 1500“,” operSt“:” up“,” operStQual“:” none“,” rn“:” lbrtdif“}}},{” nwRtVrfMbr“: {“属性 s": { "childAction": "", "l3vmCfgFailedBmp": "", "l3vmCfgFailedTs": "00:00:00:00.000", "l3vmCfgState": "0", "modTs": "2018-01-25T15:53:55.757+00:00", "monPolDn": "", "parentSKey": "unspecified", "persistentOnReload": "true", "rn": "rtvrfMbr", "status": "", "tCl": "l3Inst", "tDn": "sys/inst-default", "tSKey": "" } } } ] } }, { "l3LbRtdIf": { "attributes": { "adminSt": "up", "childAction": "", "descr": "Nothing", "id": "lo101", "linkLog": "default", "modTs": "2017-11-13T21:39:58.910+00:00", "monPolDn": "uni/fabric/monfab-default", "name": "", "persistentOnReload": "true", "rn": "lb-[lo101]", "status": "", "uid": "0" }, "children": [ { "ethpmLbRtdIf": { "attributes": { "currErrIndex": "4294967295", "ifIndex": "335544421", "iod": "12", "lastErrors": "0,0,0,0", "operBitset": "", "operDescr": "Nothing", "operMtu": "1500", "operSt": "up", "operStQual": "none", "rn": "lbrtdif" } } }, { "nwRtVrfMbr": { "attributes": { "childAction": "", "l3vmCfgFailedBmp": "", "l3vmCfgFailedTs": "00:00:00:00.00 s“:{” childAction“:”“,” l3vmCfgFailedBmp“:”“,” l3vmCfgFailedTs“:” 00:00:00:00.000“,” l3vmCfgState“:” 0“,” modTs“:” 2018-01-25T15 :53:55.757 + 00:00“,” monPolDn“:”“,” parentSKey“:”未指定“,” persistentOnReload“:” true“,” rn“:” rtvrfMbr“,” status“:”“,” tCl “:” l3Inst“,” tDn“:” sys / inst-default“,” tSKey“:”“}}}]}},{” l3LbRtdIf“:{” attributes“:{” adminSt“:” up“, “ childAction”:“”,“ descr”:“无”,“ id”:“ lo101”,“ linkLog”:“默认”,“ modTs”:“ 2017-11-13T21:39:58.910 + 00:00” ,“ monPolDn”:“ uni / fabric / monfab-default”,“ name”:“”,“ persistentOnReload”:“ true”,“ rn”:“ lb- [lo101]”,“ status”:“”,“ uid“:” 0“},” children“:[{” ethpmLbRtdIf“:{” attributes“:{” currErrIndex“:” 4294967295“,” ifIndex“:” 335544421“,” iod“:” 12“,” lastErrors “:”“ 0,0,0,0”,“ operBitset”:“”,“ operDescr”:“无”,“ operMtu”:“ 1500”,“ operSt”:“ up”,“ operStQual”:“无” ,“ rn”:“ lbrtdif”}}},{“ nwRtVrfMbr”:{“ attributes”:{“ childAction”:“”,“ l3vmCfgFailedBmp”:“”,“ l3vmCfgFailedTs”:“ 00:00:00:00.00 0", "l3vmCfgState": "0", "modTs": "2017-11-13T21:39:58.880+00:00", "monPolDn": "", "parentSKey": "unspecified", "persistentOnReload": "true", "rn": "rtvrfMbr", "status": "", "tCl": "l3Inst", "tDn": "sys/inst-default", "tSKey": "" } } } ] } }, { "l3LbRtdIf": { "attributes": { "adminSt": "up", "childAction": "", "descr": "\\"^:tier2:if:loopback:mgmt:l3\\"", "id": "lo0", "linkLog": "default", "modTs": "2017-09-25T20:29:54.003+00:00", "monPolDn": "uni/fabric/monfab-default", "name": "", "persistentOnReload": "true", "rn": "lb-[lo0]", "status": "", "uid": "0" }, "children": [ { "ethpmLbRtdIf": { "attributes": { "currErrIndex": "4294967295", "ifIndex": "335544320", "iod": "11", "lastErrors": "0,0,0,0", "operBitset": "", "operDescr": "\\"^:tier2:if:loopback:mgmt:l3\\"", "operMtu": "1500", "operSt": "up", "operStQual": "none", "rn": "lbrtdif" } } }, { "nwRtVrfMbr":... 0“,” l3vmCfgState“:” 0“,” modTs“:” 2017-11-13T21:39:58.880 + 00:00“,” monPolDn“:”“,” parentSKey“:”未指定“,” persistentOnReload“: “ true”,“ rn”:“ rtvrfMbr”,“ status”:“”,“ tCl”:“ l3Inst”,“ tDn”:“ sys / inst-default”,“ tSKey”:“”}}}]} },{“ l3LbRtdIf”:{“ attributes”:{“ adminSt”:“ up”,“ childAction”:“”,“ descr”:“ \\” ^:tier2:if:loopback:mgmt:l3 \\“”, “ id”:“ lo0”,“ linkLog”:“默认”,“ modTs”:“ 2017-09-25T20:29:54.003 + 00:00”,“ monPolDn”:“ uni / fabric / monfab-default”, “ name”:“”,“ persistentOnReload”:“ true”,“ rn”:“ lb- [lo0]”,“ status”:“”,“ uid”:“ 0”},“ children”:[{“ ethpmLbRtdIf“:{”属性“:{” currErrIndex“:” 4294967295“,” ifIndex“:” 335544320“,” iod“:” 11“,” lastErrors“:” 0,0,0,0“,” operBitset“ :“”,“ operDescr”:“ \\” ^:tier2:if:loopback:mgmt:l3 \\“”,“ operMtu”:“ 1500”,“ operSt”:“ up”,“ operStQual”:“ none”, “ rn”:“ lbrtdif”}}},{“ nwRtVrfMbr”:...

I am interested in these attributes: 我对以下属性感兴趣:

|    |    |    |    |    |    |    |-- rmonIfIn: struct (nullable = true)
|    |    |    |    |    |    |    |    |-- attributes: struct (nullable = true                                                                                        )
|    |    |    |    |    |    |    |    |    |-- broadcastPkts: string (nullabl                                                                                        e = true)
|    |    |    |    |    |    |    |    |    |-- discards: string (nullable = t                                                                                        rue)
|    |    |    |    |    |    |    |    |    |-- errors: string (nullable = tru                                                                                        e)
|    |    |    |    |    |    |    |    |    |-- multicastPkts: string (nullabl                                                                                        e = true)
|    |    |    |    |    |    |    |    |    |-- nUcastPkts: string (nullable =                                                                                         true)
|    |    |    |    |    |    |    |    |    |-- packetRate: string (nullable =                                                                                         true)
import org.apache.spark.sql.SparkSession    
import org.apache.spark.sql.types._
import org.apache.spark.sql.functions.explode
import spark.implicits._

val spark = SparkSession.builder().getOrCreate

val df = spark.read.option("header","true").option("inferSchema","true").json("file:///usr/local/Projects/out.txt")

val mapDF = df.select($"node_id_str" as "nodename", $"data".getItem("InterfaceEntity").getItem("children").getItem("l1PhysIf").getItem("children").getItem("element"))

I keep getting an error when I attempt to get any deeper, I keep getting data type error: 当我尝试更深入时,我不断收到错误,我不断收到数据类型错误:

stringJsonDF: org.apache.spark.sql.DataFrame = [nestDevice: string]
org.apache.spark.sql.AnalysisException: cannot resolve '`data`.`InterfaceEntity`.`children`.`l1PhysIf`.`children`['element']' due to data type mismatch: argument 2 requires integral type, however, ''element'' is of string type.;;

You can use Google Gson Library which is used to work with json. 您可以使用用于json的Google Gson库。 You can convert any object to json and of course do it in reverse. 您可以将任何对象转换为json,当然也可以反向进行。 here is an example for doing so: 这是一个这样做的例子:

Gson gson = new Gson();
List<Map<Long, String>> listOfMaps = new ArrayList<>();
//here you can new some maps and add them to the listOfMaps. 
String listOfMapsInJsonFormat = gson.toJson(listOfMaps);

above sample code is for converting an object to json. 上面的示例代码用于将对象转换为json。 To do the reverse job you can check below one too: 要执行相反的工作,您也可以检查以下一项:

Gson gson = new Gson();
List list = gson.fromJson(listOfMapsInJsonFormat, List.class);

the above code will change your input json string to a list which contains maps. 上面的代码会将您输入的json字符串更改为包含地图的列表。 Of course there may be a difference in the type of the map you have had before converting the original object to json and the one gson builds the object from json string. 当然,在将原始对象转换为json之前,您所拥有的地图类型可能有所不同,而一个gson则使用json字符串构建对象。 to avoid that you can use TypeToken class: 为了避免您可以使用TypeToken类:

Gson gson = new Gson();
Type type = new TypeToken()<ArrayList<Map<>>>{}.getType();
ArrayList<Map<>> = gson.fromJson(listOfMapsInJsonFormat, type);

Since the fields are part of multiple nested arrays the logic would assume that you are interested in all iterations of those fields per record (so if one record contains n rmonIfIn items due to nested arrays, you would be interested in each of them?) 由于字段是多个嵌套数组的一部分,因此逻辑将假定您对每个记录的这些字段的所有迭代都感兴趣(因此,如果一个记录由于嵌套数组而包含n个 rmonIfIn项,那么您将对它们中的每一个感兴趣吗?)

If so it makes sense to explode these nested arrays and process the expanded dataframe. 如果是这样,则explode这些嵌套数组并处理扩展的数据框是有意义的。

Based on your code and incomplete json example it could look like something like this: 根据您的代码和不完整的json示例,它看起来可能像这样:

val nested = df
  .select(explode($"data.InterfaceEntity").alias("l1"))
  .select(explode($"l1.l1PhysIf").alias("l2"))
  .select($"l2.rmonIfIn.attributes".alias("l3"))
  .select($"l3.broadcastPkts", $"l3.discards", $"l3.errors", $"l3.multicastPkts", $"l3.packetRate")

Returning a dataframe that could look like 返回一个看起来像的数据框

+-------------+--------+------+-------------+----------+
|broadcastPkts|discards|errors|multicastPkts|packetRate|
+-------------+--------+------+-------------+----------+
|1            |1       |1     |1            |1         |
|2            |2       |2     |2            |2         |
|3            |3       |3     |3            |3         |
|4            |4       |4     |4            |4         |
+-------------+--------+------+-------------+----------+

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM