I am trying to compare certain values from 2 different origin (hence the two dictionaries) with each other, to know which values actually belong together. To illustrate, a shorter version of both my dictionaries with dummy data (enters added for clarity)
dict_1 =
{'ins1': {'Start': 100, 'End': 110, 'Size': 10},
'ins2': {'Start': 150, 'End': 250, 'Size': 100},
'del1': {'Start': 210, 'End': 220, 'Size': 10},
'del2': {'Start': 260, 'End': 360, 'Size': 100},
'dup1': {'Start': 340, 'End': 350, 'Size': 10, 'Duplications': 3},
'dup2': {'Start': 370, 'End': 470, 'Size': 100, 'Duplications': 3}}
dict_2 =
{'0': {'Start': 100, 'Read': 28, 'Prec': 'PRECISE', 'Size': 10, 'End': 110},
'1': {'Start': 500, 'Read': 38, 'Prec': 'PRECISE', 'Size': 100, 'End': 600},
'2': {'Start': 210, 'Read': 27, 'Prec': 'PRECISE', 'Size': 10, 'End': 220},
'3': {'Start': 650, 'Read': 31, 'Prec': 'IMPRECISE', 'Size': 100, 'End': 750},
'4': {'Start': 370, 'Read': 31, 'Prec': 'PRECISE', 'Size': 100, 'End': 470},
'5': {'Start': 340, 'Read': 31, 'Prec': 'PRECISE', 'Size': 10, 'End': 350},
'6': {'Start': 810, 'Read': 36, 'Prec': 'PRECISE', 'Size': 10, 'End': 820}}
What I want to compare are the "Start" and "End" values (and others but not specified here). If they match, I want to make a new dict (dict_3) that looks similar to this:
dict_3 =
{'ins1': {'Start_d1': 100, 'Start_d2': 100, 'dict_2_ID': '0', etc}
{'del1': {'Start_d1': 210, 'Start_d2': 210, 'dict_2_ID': '2', etc}}
ps I need both Start_d1 and Start_d2, because they can differ slightly in number (+-5).
I tried several options already on stack overflow, like: Concatenating dictionaries with different keys into Pandas dataframe (which could work I think, but I was having so much trouble with the dataframe format) and: Comparing two dictionaries in Python (which only works if the dictionary does not have a top-layer key (like here ins1, ins2 etc.)
Could someone give me a beginning to work further with? I tried so many things already and the nested dictionary gives me trouble with all solutions that I could find.
You can do something like this perhaps:
dict_1 = {'ins1': {'Start': 100, 'End': 110, 'Size': 10},
'ins2': {'Start': 150, 'End': 250, 'Size': 100},
'del1': {'Start': 210, 'End': 220, 'Size': 10},
'del2': {'Start': 260, 'End': 360, 'Size': 100},
'dup1': {'Start': 340, 'End': 350, 'Size': 10, 'Duplications': 3},
'dup2': {'Start': 370, 'End': 470, 'Size': 100, 'Duplications': 3}}
dict_2 = {'0': {'Start': 100, 'Read': 28, 'Prec': 'PRECISE', 'Size': 10, 'End': 110},
'1': {'Start': 500, 'Read': 38, 'Prec': 'PRECISE', 'Size': 100, 'End': 600},
'2': {'Start': 210, 'Read': 27, 'Prec': 'PRECISE', 'Size': 10, 'End': 220},
'3': {'Start': 650, 'Read': 31, 'Prec': 'IMPRECISE', 'Size': 100, 'End': 750},
'4': {'Start': 370, 'Read': 31, 'Prec': 'PRECISE', 'Size': 100, 'End': 470},
'5': {'Start': 340, 'Read': 31, 'Prec': 'PRECISE', 'Size': 10, 'End': 350},
'6': {'Start': 810, 'Read': 36, 'Prec': 'PRECISE', 'Size': 10, 'End': 820}}
dict_3 = {}
for d1 in dict_1:
for d2 in dict_2:
if dict_1[d1]["Start"] == dict_2[d2]["Start"] and dict_1[d1]["End"] == dict_2[d2]["End"]:
dict_3[d1] = {"Start_d1": dict_1[d1]["Start"], "Start_d2": dict_2[d2]["Start"], "dict_2_ID": d2}
print(dict_3)
The above mentioned solution is of order n^2
which is not very efficient.
However, to make it more efficient (order n
) you'll need to transform dict_2
in such a way that it contains "Start"
and "End"
values as it's key (Eg: 'S100E110') then lookup will be of constant time (dictionary lookup) ref . Then, you'll be able to do something like:
if str("S"+dict_1[d1]["Start"]+"E"+dict_1[d1]["End"]) in dict_2:
# add to dict_3
You can use Pandas; here's a demo:
import pandas as pd
df1 = pd.DataFrame.from_dict(dict_1, orient='index')
df2 = pd.DataFrame.from_dict(dict_2, orient='index')
res = pd.merge(df1, df2, on=['Start', 'End', 'Size'])
print(res)
Start End Size Duplications Read Prec
0 210 220 10 NaN 27 PRECISE
1 340 350 10 3.0 31 PRECISE
2 370 470 100 3.0 31 PRECISE
3 100 110 10 NaN 28 PRECISE
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.