简体   繁体   中英

How to deal with complex nested JSON data using python?

I have a Message like this. The message is derived after doing.ParseFromString(). These messages are being transferred via ZMQ in protobuf format.

    summaries {
  key: "node_A"
  value {
    value_A {
      data_1: 29994
      data_2: 0.07402841001749039
      data_3: -6.621330976486206e-05
    }
    some_activity {
      sys_activity {
        key: "arch_prctl"
        value: 174
      }
      sys_activty {
        key: "execve"
        value: 174
      }
      sys_activity {
        key: "fork"
        value: 261
      }
      some_events_A: 174
      some_events_B: 261
    }
    new_activity {
      sys_new_activity {
        key: "close"
        value: 232
      }
      sys_new_activity {
        key: "open"
        value: 116
      }
      some_new_events: 116
    }
    more_activity {
    }
    error_activity {
    }
    some_alerts {
    }
  }
}

I need to return the output as

some_activity: ["arch_prctl","execve","fork"],
some_events_A: 174
some_events_B: 261

I am able to get value_A fields by using like value_A.data_1

But I am finding it hard to return the remaining nested fields. I tried to use json.dumps but it gives me JSON object is not a serializable error.

The number of sys_activty in some_activity varies and its not always 3 values as given below.

Let me know if the question is unclear. Let's assume that the service sending this message is not editable and we only have the option to read and give the required output on the client-side. Thanks in advance

You can try this and please don't hit me because of this ugly solution

import regex as re
import json

jstring = ""
f = open("brokenjson.txt", "r")
for x in f:
    a = x.strip()
    if a[-1] != "{":
        a += ","
    else:
        a = a.replace("{", ": {") # add `:` to key : value pair

    if ":" in a:
        b = a.split(":")
        if "\"" not in b[0]:
            a = "\"" + b[0].strip() + "\": " + b[1] # key don't have double quotes "
    jstring += a

jstring = jstring.replace(",}", "}") # remove trailing commas

if jstring[-1] == ",":
    jstring = jstring[:-1]  # check if trailing commas at the end or not

if jstring[0] != "{" and jstring[0] != "[":
    jstring = "{" + jstring + "}"  # add bracket

result = json.loads(jstring)

distActivity = {}
distEvent = {}
for key, val in result["summaries"]["value"].items():
    if "activity" in key:
        if key not in distActivity:
            distActivity[key] = []
        for k,v in val.items():
            if "activity" in k:
                distActivity[key].append(v["key"])
            if "event" in k:
                distEvent[k] = v

print(distActivity)
print(distEvent)

Because your sample have some duplicate keys so I only got this result

{'some_activity': ['fork'], 'new_activity': ['open'], 'more_activity': [], 'error_activity': []}
{'some_events_A': 174, 'some_events_B': 261, 'some_new_events': 116}

Using MessageToJson() to convert protobuf instead of ParseFromString() solved the issue

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM