How to obtain symmetrical matrix from dictionary in Python

Question

I have a basic question about data manipulation in Python.

I have the following dictionary:

mydict={('A', 'E'): 23972,
 ('A', 'D'): 10730,
 ('A', 'B'): 14748,
 ('A', 'C'): 3424,
 ('E', 'D'): 3294,
 ('E', 'B'): 16016,
 ('E', 'C'): 3373,
 ('D', 'B'): 69734,
 ('D', 'C'): 4662,
 ('B', 'C'): 159161}

If you look carefully, this is half of a symmetrical matrix with null diagonal (the 0s are not included). My final goal is to write a pandas dataframe with the full matrix.

Tentative solution

I thought about "unpacking" the dictionary obtaining 5 lists, one per label, with all the values related to the other labels, adding a 0 on the self-position of the list. For label "A" and "B", the desired result would be:

A=[0,mydict(['A','B']),mydict(['A','C']),mydict(['A','D']),mydict(['A','E'])]
B=[mydict(['A','B']),0,mydict(['B','C']),mydict(['D','B']),mydict(['E','B'])]

and so on for C,D,E. Notice that, in B, 4th and 5th elements are mydict(['D','B']) and mydict(['E','B']), because mydict(['B','D']) and mydict(['B','E']) simply don't exist in mydict.

This way I could easily populate a dataframe from these lists:

import pandas as pd
df=pd.DataFrame(columns=['A','B','C','D','E'])
df['A']=A
df['B']=B

Question

I am not quite sure about how I can "unpack" mydict into those lists, or into any other container that could help me building the matrix. Any suggestions?

Answer 1

One option is to reconstruct the dictionary in full matrix format and then pivot it with pandas:

import pandas as pd
mydict={('A', 'E'): 23972,
 ('A', 'D'): 10730,
 ('A', 'B'): 14748,
 ('A', 'C'): 3424,
 ('E', 'D'): 3294,
 ('E', 'B'): 16016,
 ('E', 'C'): 3373,
 ('D', 'B'): 69734,
 ('D', 'C'): 4662,
 ('B', 'C'): 159161}
 
 
# construct the full dictionary
newdict = {}

for (k1, k2), v in mydict.items():
    newdict[k1, k2] = v
    newdict[k2, k1] = v
    newdict[k1, k1] = 0
    newdict[k2, k2] = 0

# pivot the result from long to wide
pd.Series(newdict).reset_index().pivot(index='level_0', columns='level_1', values=0)

#level_1      A       B       C      D      E
#level_0                                     
#A            0   14748    3424  10730  23972
#B        14748       0  159161  69734  16016
#C         3424  159161       0   4662   3373
#D        10730   69734    4662      0   3294
#E        23972   16016    3373   3294      0

Or as commented by @Ch3steR, you can also just do pd.Series(newdict).unstack() for the pivot.

Demo link

Answer 2

What I can think of is populate the dict values to an array first then construct dataframe.

mydict={('A', 'E'): 23972,
 ('A', 'D'): 10730,
 ('A', 'B'): 14748,
 ('A', 'C'): 3424,
 ('E', 'D'): 3294,
 ('E', 'B'): 16016,
 ('E', 'C'): 3373,
 ('D', 'B'): 69734,
 ('D', 'C'): 4662,
 ('B', 'C'): 159161}
 
import numpy as np
import pandas as pd

a = np.full((5,5),0)
ss = 'ABCDE'

for k, i in mydict.items():
    f,s = k 
    fi = ss.index(f)
    si = ss.index(s)
    a[fi,si] = i
    a[si,fi] = i

# if you want to keep the diagonal
df = pd.DataFrame(a)

# if you want to remove diagonal:
no_diag = np.delete(a,range(0,a.shape[0]**2,(a.shape[0]+1))).reshape(a.shape[0],(a.shape[1]-1))

df = pd.DataFrame(no_diag)

Answer 3

Here is a straight forward solution which should not take too much time to run as well -

cols = np.unique(list(mydict.keys())).ravel()

df = pd.DataFrame(0, columns=cols, index=cols)

for i in mydict.items():
    df.loc[i[0]] = i[1] 

df = df + df.T
print(df)

       A       B       C      D      E
A      0   14748    3424  10730  23972
B  14748       0  159161  69734  16016
C   3424  159161       0   4662   3373
D  10730   69734    4662      0   3294
E  23972   16016    3373   3294      0

Benchmarks

Adding Benchmarks (303 length input, MacBook pro 13)-

kk = 'ABCDEFGHIJKLMNOPQURSUVWXYZ'
mydict = {i:np.random.randint(1,10000) for i in itertools.combinations(kk,2)}
len(mydict)
#303

fusion's approach - 392 µs ± 16.4 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
Psidom's approach - 4.95 ms ± 286 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
Akshay Sehgal's approach - 34.8 ms ± 884 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
Ben.T's approach - 4.01 ms ± 282 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

Fusion's approach is the fastest by a long shot.

Answer 4

once create a serie form the dictionary and then unstack to get a dataframe. Get union of index and columns to be able to reindex both with all possible values. Add the transpose of this dataframe to itself for missing values.

df_ = pd.Series(mydict).unstack(fill_value=0)
idx = df_.index.union(df_.columns)
df_ = df_.reindex(index=idx, columns=idx, fill_value=0)
df_ += df_.T

print(df_)
       A       B       C      D      E
A      0   14748    3424  10730  23972
B  14748       0  159161  69734  16016
C   3424  159161       0   4662   3373
D  10730   69734    4662      0   3294
E  23972   16016    3373   3294      0

How to obtain symmetrical matrix from dictionary in Python

Question

4 answers

solution1
3 ACCPTED 2020-08-30 16:48:22

solution2
3 2020-08-30 16:52:34

solution3
3 2020-08-30 17:00:53

solution4
1 2020-08-30 17:01:21

How to obtain symmetrical matrix from dictionary in Python

Question

4 answers

solution1 3 ACCPTED 2020-08-30 16:48:22

solution2 3 2020-08-30 16:52:34

solution3 3 2020-08-30 17:00:53

solution4 1 2020-08-30 17:01:21

solution1
3 ACCPTED 2020-08-30 16:48:22

solution2
3 2020-08-30 16:52:34

solution3
3 2020-08-30 17:00:53

solution4
1 2020-08-30 17:01:21