I have a basic question about data manipulation in Python.
I have the following dictionary:
mydict={('A', 'E'): 23972,
('A', 'D'): 10730,
('A', 'B'): 14748,
('A', 'C'): 3424,
('E', 'D'): 3294,
('E', 'B'): 16016,
('E', 'C'): 3373,
('D', 'B'): 69734,
('D', 'C'): 4662,
('B', 'C'): 159161}
If you look carefully, this is half of a symmetrical matrix with null diagonal (the 0s are not included). My final goal is to write a pandas dataframe with the full matrix.
Tentative solution
I thought about "unpacking" the dictionary obtaining 5 lists, one per label, with all the values related to the other labels, adding a 0 on the self-position of the list. For label "A" and "B", the desired result would be:
A=[0,mydict(['A','B']),mydict(['A','C']),mydict(['A','D']),mydict(['A','E'])]
B=[mydict(['A','B']),0,mydict(['B','C']),mydict(['D','B']),mydict(['E','B'])]
and so on for C,D,E. Notice that, in B, 4th and 5th elements are mydict(['D','B']) and mydict(['E','B']), because mydict(['B','D']) and mydict(['B','E']) simply don't exist in mydict.
This way I could easily populate a dataframe from these lists:
import pandas as pd
df=pd.DataFrame(columns=['A','B','C','D','E'])
df['A']=A
df['B']=B
Question
I am not quite sure about how I can "unpack" mydict into those lists, or into any other container that could help me building the matrix. Any suggestions?
One option is to reconstruct the dictionary in full matrix format and then pivot it with pandas:
import pandas as pd
mydict={('A', 'E'): 23972,
('A', 'D'): 10730,
('A', 'B'): 14748,
('A', 'C'): 3424,
('E', 'D'): 3294,
('E', 'B'): 16016,
('E', 'C'): 3373,
('D', 'B'): 69734,
('D', 'C'): 4662,
('B', 'C'): 159161}
# construct the full dictionary
newdict = {}
for (k1, k2), v in mydict.items():
newdict[k1, k2] = v
newdict[k2, k1] = v
newdict[k1, k1] = 0
newdict[k2, k2] = 0
# pivot the result from long to wide
pd.Series(newdict).reset_index().pivot(index='level_0', columns='level_1', values=0)
#level_1 A B C D E
#level_0
#A 0 14748 3424 10730 23972
#B 14748 0 159161 69734 16016
#C 3424 159161 0 4662 3373
#D 10730 69734 4662 0 3294
#E 23972 16016 3373 3294 0
Or as commented by @Ch3steR, you can also just do pd.Series(newdict).unstack()
for the pivot.
What I can think of is populate the dict values to an array first then construct dataframe.
mydict={('A', 'E'): 23972,
('A', 'D'): 10730,
('A', 'B'): 14748,
('A', 'C'): 3424,
('E', 'D'): 3294,
('E', 'B'): 16016,
('E', 'C'): 3373,
('D', 'B'): 69734,
('D', 'C'): 4662,
('B', 'C'): 159161}
import numpy as np
import pandas as pd
a = np.full((5,5),0)
ss = 'ABCDE'
for k, i in mydict.items():
f,s = k
fi = ss.index(f)
si = ss.index(s)
a[fi,si] = i
a[si,fi] = i
# if you want to keep the diagonal
df = pd.DataFrame(a)
# if you want to remove diagonal:
no_diag = np.delete(a,range(0,a.shape[0]**2,(a.shape[0]+1))).reshape(a.shape[0],(a.shape[1]-1))
df = pd.DataFrame(no_diag)
Here is a straight forward solution which should not take too much time to run as well -
cols = np.unique(list(mydict.keys())).ravel()
df = pd.DataFrame(0, columns=cols, index=cols)
for i in mydict.items():
df.loc[i[0]] = i[1]
df = df + df.T
print(df)
A B C D E
A 0 14748 3424 10730 23972
B 14748 0 159161 69734 16016
C 3424 159161 0 4662 3373
D 10730 69734 4662 0 3294
E 23972 16016 3373 3294 0
Benchmarks
Adding Benchmarks (303 length input, MacBook pro 13)-
kk = 'ABCDEFGHIJKLMNOPQURSUVWXYZ'
mydict = {i:np.random.randint(1,10000) for i in itertools.combinations(kk,2)}
len(mydict)
#303
Fusion's approach is the fastest by a long shot.
once create a serie form the dictionary and then unstack
to get a dataframe. Get union
of index and columns to be able to reindex
both with all possible values. Add the transpose of this dataframe to itself for missing values.
df_ = pd.Series(mydict).unstack(fill_value=0)
idx = df_.index.union(df_.columns)
df_ = df_.reindex(index=idx, columns=idx, fill_value=0)
df_ += df_.T
print(df_)
A B C D E
A 0 14748 3424 10730 23972
B 14748 0 159161 69734 16016
C 3424 159161 0 4662 3373
D 10730 69734 4662 0 3294
E 23972 16016 3373 3294 0
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.