简体   繁体   中英

Filter nested python dict by value

I have a python dictionary, where I don't exactly know, how deeply nested it is, but here is an example of such:

{
   "name":"a_struct",
   "type":"int",
   "data":{
      "type":"struct",
      "elements":[
         {
            "data":[
               {
                  "name":"test1",
                  "data_id":0,
                  "type":"uint8",
                  "wire_type":0,
                  "data":0
               },
               {
                  "name":"test2",
                  "data_id":2,
                  "type":"uint32",
                  "wire_type":2,
                  "data":0
               },
               {
                  "name":"test3",
                  "data_id":3,
                  "type":"int",
                  "wire_type":4,
                  "data":{
                     "type":"uint32",
                     "elements":[
                        
                     ]
                  }
               },
               {
                  "name":"test4",
                  "data_id":4,
                  "type":"uint32",
                  "wire_type":2,
                  "data":0
               },
               {
                  "name":"test5",
                  "data_id":5,
                  "type":"int",
                  "wire_type":4,
                  "data":{
                     "type":"uint32",
                     "elements":[
                        
                     ]
                  }
               }
            ]
         }
      ]
   }
}

My goal is to filter out each dictionary that does not contains values ["test1", "test3", "test5"] by the name key. This shall be applicable to various deeply nested dictionaries.

So in that case, the result shall be a filtered dictionary :

{
   "name":"a_struct",
   "type":"int",
   "data":{
      "type":"struct",
      "elements":[
         {
            "data":[
               {
                  "name":"test1",
                  "data_id":0,
                  "type":"uint8",
                  "wire_type":0,
                  "data":0
               },
               {
                  "name":"test3",
                  "data_id":3,
                  "type":"int",
                  "wire_type":4,
                  "data":{
                     "type":"uint32",
                     "elements":[
                        
                     ]
                  }
               },
               {
                  "name":"test5",
                  "data_id":5,
                  "type":"int",
                  "wire_type":4,
                  "data":{
                     "type":"uint32",
                     "elements":[
                        
                     ]
                  }
               }
            ]
         }
      ]
   }
}

I tried to use the dpath lib ( https://pypi.org/project/dpath/ ), by providing a filter criteria like so:

def afilter(x):
    if isinstance(x, dict):
        if "name" in x:
            if x["name"] in ["test1", "test3", "test5"]:
                return True
            else:
                return False
    else:
        return False

result = dpath.util.search(my_dict, "**", afilter=afilter)

But I get a wrong result, so every other key, has been filtered out, which is not what I want:

{
   "data":{
      "elements":[
         {
            "data":[
               {
                  "name":"test1",
                  "data_id":0,
                  "type":"uint8",
                  "wire_type":0,
                  "data":0
               },
               null,
               {
                  "name":"test3",
                  "data_id":3,
                  "type":"int",
                  "wire_type":4,
                  "data":{
                     "type":"uint32",
                     "elements":[
                        
                     ]
                  }
               },
               null,
               {
                  "name":"test5",
                  "data_id":5,
                  "type":"int",
                  "wire_type":4,
                  "data":{
                     "type":"uint32",
                     "elements":[
                        
                     ]
                  }
               }
            ]
         }
      ]
   }
}

How to get this right?

PS: I'm not forced to use the dpath lib. So, the solution might be written in pure python.

You can recursively process your dictionary while filtering unneeded records:

def delete_keys(data, keys_to_keep):
    res = {}
    for k, v in data.items():
        if isinstance(v, dict):
            res[k] = delete_keys(v, keys_to_keep)
        elif isinstance(v, list):
            if k == "data":
                res[k] = [delete_keys(obj, keys_to_keep) for obj in v if obj.get('name') in keys_to_keep]
            else:
                res[k] = [delete_keys(obj, keys_to_keep) for obj in v]
        else:
            res[k] = v
    return res

keys_to_keep = {'test1', 'test3', 'test5'}
print(delete_keys(data, keys_to_keep))

For your input, it gives:

{
    "name": "a_struct",
    "type": "int",
    "data": {
        "type": "struct",
        "elements": [
            {
                "data": [
                    {
                        "name": "test1",
                        "data_id": 0,
                        "type": "uint8",
                        "wire_type": 0,
                        "data": 0,
                    },
                    {
                        "name": "test3",
                        "data_id": 3,
                        "type": "int",
                        "wire_type": 4,
                        "data": {"type": "uint32", "elements": []},
                    },
                    {
                        "name": "test5",
                        "data_id": 5,
                        "type": "int",
                        "wire_type": 4,
                        "data": {"type": "uint32", "elements": []},
                    },
                ]
            }
        ],
    },
}

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM