简体   繁体   中英

intersection of two lists of dictionaries by key/value pair

I have two lists of dictionaries in the format:

systolic_sex = [
        {'attribute': u'bp', 'value_d': 133.0, 'value_s': u'133', 'sid': 6}, 
        {'attribute': u'bp', 'value_d': 127.0, 'value_s': u'127', 'sid': 17}, 
        {'attribute': u'bp', 'value_d': 121.0, 'value_s': u'121', 'sid': 18}, 
        {'attribute': u'bp', 'value_d': 127.0, 'value_s': u'127', 'sid': 27}, 
        {'attribute': u'bp', 'value_d': 120.0, 'value_s': u'120', 'sid': 42},
        {'attribute': u'SEX', 'value_d': 0.0, 'value_s': u'M', 'sid': 6},      
        {'attribute': u'SEX', 'value_d': 0.0, 'value_s': u'M', 'sid': 17},   
        {'attribute': u'SEX', 'value_d': 0.0, 'value_s': u'M', 'sid': 18},
        {'attribute': u'SEX', 'value_d': 0.0, 'value_s': u'M', 'sid': 27},   
        {'attribute': u'SEX', 'value_d': 0.0, 'value_s': u'M', 'sid': 42}
    ]



sex = [
        {'attribute': u'SEX', 'value_d': 0.0, 'value_s': u'M', 'sid': 6},      
        {'attribute': u'SEX', 'value_d': 0.0, 'value_s': u'M', 'sid': 17},   
        {'attribute': u'SEX', 'value_d': 0.0, 'value_s': u'M', 'sid': 42}
    ]

I want to match these lists by the value of the key 'sid,' so that if the same value of 'sid' is in both, I have a match, otherwise, I do not. If I have a match, I then append the matching dictionaries by 'sid' from both sets to a new list accordingly like so

new_set = [
        {'attribute': u'bp', 'value_d': 133.0, 'value_s': u'133', 'sid': 6}, 
        {'attribute': u'SEX', 'value_d': 0.0, 'value_s': u'M', 'sid': 6},
        {'attribute': u'bp', 'value_d': 127.0, 'value_s': u'127', 'sid': 17}, 
        {'attribute': u'SEX', 'value_d': 0.0, 'value_s': u'M', 'sid': 17},
        {'attribute': u'bp', 'value_d': 120.0, 'value_s': u'120', 'sid': 42},
        {'attribute': u'SEX', 'value_d': 0.0, 'value_s': u'M', 'sid': 42}
    ]

I've tried various methods of intersecting these, including modifying answers from Match set of dictionaries , but I am looking to create a new list of dictionaries that have the matching sids, not replacing values between the two lists.

You may be interested in using pandas if you're dealing with data like this a lot. Your dictionaries are already in the form pandas likes, so you can do this:

import pandas

systolic_sex = pandas.DataFrame(systolic_sex)
sex = pandas.DataFrame(sex)

matches = systolic_sex[systolic_sex.sid.isin(sex.sid)]

If you want the data back in the same format as you supplied them, you can to

output = matches.to_dict(orient='records')

Going off the answer in the post you linked:

systolic_sex = dict((e['sid'], e) for e in systolic_sex)
sex = set(e['sid'] for e in sex)

matches = []
for sid,v in systolic_sex.items():
    if sid not in sex: continue
    matches.append(v)
>>> uniq=set(e['sid'] for e in sex) 
>>> filter(lambda d: d['sid'] in uniq, systolic_sex)
[{'attribute': u'bp', 'sid': 6L, 'value_s': u'133', 'value_d': 133.0},        
 {'attribute': u'bp', 'sid': 17L, 'value_s': u'127', 'value_d': 127.0},  
 {'attribute': u'bp', 'sid': 42L, 'value_s': u'120', 'value_d': 120.0}, 
 {'attribute': u'SEX', 'sid': 6L, 'value_s': u'M', 'value_d': 0.0}, 
 {'attribute': u'SEX', 'sid': 17L, 'value_s': u'M', 'value_d': 0.0}, 
 {'attribute': u'SEX', 'sid': 42L, 'value_s': u'M', 'value_d': 0.0}]

I ended up using the following (as per @chtohnicdaemon):

import pandas
#-----> code snipped here
#----->
# iterate over record sets returned by SQLAlchemy to populate list
    for result in query_right:
        data = {'sid': result.patient_sid,
                'value_s': result.string_value,
                'value_d': result.double_value,
                'attribute': result.attribute_value}

                result_right.append(data)

    for result in left_child:
        data = {'sid': result.patient_sid,
                'value_s': result.string_value,
                'value_d': result.double_value,
                'attribute': result.attribute_value}

                result_left.append(data)

# convert list of dictionaries to data frames
right = pandas.DataFrame(right_result)
left = pandas.DataFrame(left_result)

# get matches
matches_right  = right[right.sid.isin(left.sid)]
matches_left  = left[left.sid.isin(right.sid)]

# combine matched sets into single set
frames = [matches_right,matches_left]

# concatenate data, drop duplicates and convert back to a list of dictionaries
result = pd.concat(frames).drop_duplicates().to_dict(orient='records')

Worked like a charm!

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM