简体   繁体   中英

How can I transform a dictionary of dictionary into a matrix?

I want to compute correlation percentages between multiple items that appear in log files. In doing so, I get the number of times they appear divided by the number of times they appear while another item was present. I won't go too much in the details but this correlation is not symmetrical (The correlation between A and B is not the same as between B and A)

As an output I have a dictionary that has a format like this one :

{
    itemA:  {
        itemB: 0.85,
        itemC: 0.12
    },
    itemB:  {
        itemC: 0.68,
        itemA: 0.24
    },
    itemC:  {
        itemA: 0.28
    }
}

I have tried working with DictVectorizer from sklearn but it doesn't work since it requires a list of dictionaries.

I would like the output to be a matrix for visualisation with matplotlib

something like this :

[[1,0.85,0.12]
[0.68,1,0.24]
[0.28,0,1]]

If possible, I would also like to have a matplotlib visualisation with a legend for each line and column, since my dict has way more than 3 items.

I hope that everything is clear. Thank you for your help.

You can do this efficiently with pandas and numpy:

import pandas as pd

d = {
    'itemA':  {
        'itemB': 0.85,
        'itemC': 0.12
    },
    'itemB':  {
        'itemA': 0.68,
        'itemC': 0.24
    },
    'itemC':  {
        'itemA': 0.28
    }
}

df = pd.DataFrame(d)

# since this is a matrix of co-occurrences of a set of objects,
# sort columns and rows alphabetically
df = df.sort_index(axis=0)
df = df.sort_index(axis=1)

# the matrix is now the values of the dataframe
a = df.values.T

# if needed, fill the diagonal with 1 and replace NaN with 0
import numpy as np

np.fill_diagonal(a, 1)
a[np.isnan(a)] = 0

The matrix now is:

array([[1.  , 0.85, 0.12],
       [0.68, 1.  , 0.24],
       [0.28, 0.  , 1.  ]])

To visualize this matrix:

import matplotlib.pyplot as plt
plt.matshow(a)
plt.show()

The row and column ids will be shown as labels.

Here is a code that work with an array, but you can easily adapt it to the sequence you want to use.

dictionary = {
    'itemA':  {
        'itemB': 0.85,
        'itemC': 0.12
    },
    'itemB':  {
        'itemA': 0.68,
        'itemC': 0.24
    },
    'itemC':  {
        'itemA': 0.28
    }
}

matrix = []
i = 0
for v in dictionary.values():
    tmp_mat = []
    for h in v.values():
        if len(tmp_mat) == i:
            tmp_mat.append(1)
        tmp_mat.append(h)
    i += 1
    if len(tmp_mat) == len(v):
        tmp_mat.append(1)
    matrix.append(tmp_mat)

print(matrix)

Output:

[[1, 0.85, 0.12], [0.68, 1, 0.24], [0.28, 1]]

unpacking keys and values of a dictionary

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM