简体   繁体   English

比较具有不同键的嵌套字典

[英]comparing nested dictionaries with different keys

I am trying to compare certain values from 2 different origin (hence the two dictionaries) with each other, to know which values actually belong together.我试图将来自 2 个不同来源(因此是两个字典)的某些值相互比较,以了解哪些值实际上属于一起。 To illustrate, a shorter version of both my dictionaries with dummy data (enters added for clarity)为了说明,我的两个字典的较短版本带有虚拟数据(为清楚起见添加了输入)

dict_1 = 
{'ins1': {'Start': 100, 'End': 110, 'Size': 10}, 
'ins2': {'Start': 150, 'End': 250, 'Size': 100}, 
'del1': {'Start': 210, 'End': 220, 'Size': 10}, 
'del2': {'Start': 260, 'End': 360, 'Size': 100}, 
'dup1': {'Start': 340, 'End': 350, 'Size': 10, 'Duplications': 3}, 
'dup2': {'Start': 370, 'End': 470, 'Size': 100, 'Duplications': 3}}

dict_2 = 
{'0': {'Start': 100, 'Read': 28, 'Prec': 'PRECISE', 'Size': 10, 'End': 110}, 
'1': {'Start': 500, 'Read': 38, 'Prec': 'PRECISE', 'Size': 100, 'End': 600}, 
'2': {'Start': 210, 'Read': 27, 'Prec': 'PRECISE', 'Size': 10, 'End': 220}, 
'3': {'Start': 650, 'Read': 31, 'Prec': 'IMPRECISE', 'Size': 100, 'End': 750}, 
'4': {'Start': 370, 'Read': 31, 'Prec': 'PRECISE', 'Size': 100, 'End': 470}, 
'5': {'Start': 340, 'Read': 31, 'Prec': 'PRECISE', 'Size': 10, 'End': 350}, 
'6': {'Start': 810, 'Read': 36, 'Prec': 'PRECISE', 'Size': 10, 'End': 820}}

What I want to compare are the "Start" and "End" values (and others but not specified here).我要比较的是“开始”和“结束”值(以及其他但未在此处指定的值)。 If they match, I want to make a new dict (dict_3) that looks similar to this:如果它们匹配,我想创建一个与此类似的新 dict (dict_3):

dict_3 = 
{'ins1': {'Start_d1': 100, 'Start_d2': 100, 'dict_2_ID': '0', etc}
{'del1': {'Start_d1': 210, 'Start_d2': 210, 'dict_2_ID': '2', etc}}

ps I need both Start_d1 and Start_d2, because they can differ slightly in number (+-5). ps 我需要 Start_d1 和 Start_d2,因为它们的数量可能略有不同(+-5)。

I tried several options already on stack overflow, like: Concatenating dictionaries with different keys into Pandas dataframe (which could work I think, but I was having so much trouble with the dataframe format) and: Comparing two dictionaries in Python (which only works if the dictionary does not have a top-layer key (like here ins1, ins2 etc.)我已经在堆栈溢出时尝试了几个选项,例如:将具有不同键的字典连接到 Pandas 数据帧中(我认为这可以工作,但我在数据帧格式方面遇到了很多麻烦)和: 比较 Python 中的两个字典(仅当字典没有顶层键(比如这里的 ins1、ins2 等)

Could someone give me a beginning to work further with?有人可以让我开始进一步合作吗? I tried so many things already and the nested dictionary gives me trouble with all solutions that I could find.我已经尝试了很多东西,嵌套字典给我找到的所有解决方案都带来了麻烦。

You can do something like this perhaps:也许你可以做这样的事情:

dict_1 = {'ins1': {'Start': 100, 'End': 110, 'Size': 10},
'ins2': {'Start': 150, 'End': 250, 'Size': 100}, 
'del1': {'Start': 210, 'End': 220, 'Size': 10}, 
'del2': {'Start': 260, 'End': 360, 'Size': 100}, 
'dup1': {'Start': 340, 'End': 350, 'Size': 10, 'Duplications': 3}, 
'dup2': {'Start': 370, 'End': 470, 'Size': 100, 'Duplications': 3}}

dict_2 = {'0': {'Start': 100, 'Read': 28, 'Prec': 'PRECISE', 'Size': 10, 'End': 110},
'1': {'Start': 500, 'Read': 38, 'Prec': 'PRECISE', 'Size': 100, 'End': 600}, 
'2': {'Start': 210, 'Read': 27, 'Prec': 'PRECISE', 'Size': 10, 'End': 220}, 
'3': {'Start': 650, 'Read': 31, 'Prec': 'IMPRECISE', 'Size': 100, 'End': 750}, 
'4': {'Start': 370, 'Read': 31, 'Prec': 'PRECISE', 'Size': 100, 'End': 470}, 
'5': {'Start': 340, 'Read': 31, 'Prec': 'PRECISE', 'Size': 10, 'End': 350}, 
'6': {'Start': 810, 'Read': 36, 'Prec': 'PRECISE', 'Size': 10, 'End': 820}}

dict_3 = {}
for d1 in dict_1:
    for d2 in dict_2:
        if dict_1[d1]["Start"] == dict_2[d2]["Start"] and dict_1[d1]["End"] == dict_2[d2]["End"]:
            dict_3[d1] = {"Start_d1": dict_1[d1]["Start"], "Start_d2": dict_2[d2]["Start"], "dict_2_ID": d2}

print(dict_3)                        

The above mentioned solution is of order n^2 which is not very efficient.上面提到的解决方案是n^2 ,这不是很有效。

However, to make it more efficient (order n ) you'll need to transform dict_2 in such a way that it contains "Start" and "End" values as it's key (Eg: 'S100E110') then lookup will be of constant time (dictionary lookup) ref .但是,为了使其更有效(顺序n ),您需要以包含"Start""End"值作为键的方式转换dict_2 (例如:'S100E110')然后查找将是恒定时间(字典查找) ref Then, you'll be able to do something like:然后,您将能够执行以下操作:

if str("S"+dict_1[d1]["Start"]+"E"+dict_1[d1]["End"]) in dict_2:    
   # add to dict_3

You can use Pandas;你可以使用熊猫; here's a demo:这是一个演示:

import pandas as pd

df1 = pd.DataFrame.from_dict(dict_1, orient='index')
df2 = pd.DataFrame.from_dict(dict_2, orient='index')

res = pd.merge(df1, df2, on=['Start', 'End', 'Size'])

print(res)

   Start  End  Size  Duplications  Read     Prec
0    210  220    10           NaN    27  PRECISE
1    340  350    10           3.0    31  PRECISE
2    370  470   100           3.0    31  PRECISE
3    100  110    10           NaN    28  PRECISE

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM