简体   繁体   中英

Extract common values from nested dict along with the primary keys

We're working on a project, where we need to find common children between parents. The dataset is a register og peoples names, numbers and childrens person-number (around 600 persons) We want to find the parents for the children, meaning that we need to find the instances where two people share the same person-number under 'Children'.

we have put all the persons into a dict, and put children (because there are more than one) into a seperate dict for each person. We have been successful in getting all the children's Person-number out of the dict, but now what. We need to know which 'parents' they are connected to and find their parent pair! help. The dict looks like this:

Ages = []
Gender = []   
DadAges = []  
MomAges = []
findchildren = False  
findchildren1 = False
register = dict()
index = 0

for line in infile:
    #Add all information to a dict called register:
    # The dict, register, contains a dict for each person named [1-500]
    details = line.strip()
    if line.startswith("CPR"):
        register[index] = {"CPR": details[5:]}
    if line.startswith("First name"):
        register[index].update({"First name": details[12:]})
    if line.startswith("Last name"):
        register[index].update({"Last name": details[11:]})
    if line.startswith("Height"):
        register[index].update({"Height": details[8:]})
    if line.startswith("Weight"):
        register[index].update({"Weight": details[8:]})
    if line.startswith("Eye color"):
        register[index].update({"Eye color":details[11:]})
    if line.startswith("Blood type"):
        register[index].update({"Blood type": details[12:]})
    if line.startswith("Children"):
        register[index].update({"Children": details[10:].split()})
    if line == "\n":
        index += 1

Also the output looks like this, where each number is the person

3 {'CPR': '230226-9781', 'First name': 'Anton', 'Last name': 'Gade', 'Height': '201', 'Weight': '65', 'Eye color': 'Black', 'Blood type': 'A+', 'Children': ['081154-2786', '120853-1151', '050354-4664']}
4 {'CPR': '120194-9148', 'First name': 'Belina', 'Last name': 'Pedersen', 'Height': '160', 'Weight': '87', 'Eye color': 'Black', 'Blood type': 'O-'}
5 {'CPR': '220567-1489', 'First name': 'Mikael', 'Last name': 'Wad', 'Height': '175', 'Weight': '86', 'Eye color': 'Green', 'Blood type': 'A-', 'Children': ['260195-4304', '081295-4166']}
6 {'CPR': '141087-7452', 'First name': 'Inger', 'Last name': 'Nielsen', 'Height': '184', 'Weight': '101', 'Eye color': 'Grey', 'Blood type': 'AB-'}

What is the best way to do this? we are pretty blank ☺️

Here is a function that will return the common children between any two of those person dicts.

def find_common_children(parent_a, parent_b):
    return set(parent_a["Children"]).intersection(set(parent_b["Children"])

if you want to find a parent pair from a child, that's a different matter. you'll want to construct another data structure, probably something like this

{
    child_id: [parent_a_id, parent_b_id],
    # etc. 
}

then it's a simple lookup by child id in this dict. You can construct this dict like this:

from collections import defaultdict
# defaultdict just simplifies working with dicts whose values are collections
# in our case, we'll use lists for the values

parents_by_child = defaultdict(list)
for parent, data in registry.items():
    for child in data["Children"]: 
# make sure you create your registry with Children: [] if there are no children
# otherwise this line ^ will produce errors
    parents_by_child[child].append(parent["CPR"])

I should also point out that creating a dict where each of the keys is its index is probably not what you want to do. You can use a list in that case, which automatically indexes from 0 to len -1. Use dicts when you need to look something up by a key.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM