Create Similarity Matrix

Question

I have a matrix that looks like the following:

col_1 col_2 value

 A      B     2.1

 A      C     1.3

 B      C     4.6

 A      D     1.4

....

I would like to get a similarity matrix:

    A    B    C   D

A    X   2.1  1.3  1.4

B    2.1  X   4.6   ...

C    ...  ...  X    ...

D    ...  ... ...    X

So the row and the column names are A,B,C,D and the it takes the value from the third column and adds it to the matrix The problem is also that the original matrix has a length of approximately 10 000 rows.

Answer 1

As Roland suggests, you could use dcast() :

library(data.table)
dcast(df, col_1 ~ col_2)
##   col_1   B   C   D
## 1     A 2.1 1.3 1.4
## 2     B  NA 4.6  NA

Where:

df <- data.frame(
  col_1 = c("A", "A", "B", "A"), 
  col_2 = c("B","C", "C", "D"), 
  value = c(2.1, 1.3, 4.6, 1.4)
)

Answer 2

With xtabs and mutate_at . sparse = TRUE turns the output to a sparseMatrix:

library(dplyr)

mat <- df %>%
  mutate_at(1:2, factor, levels = unique(c(levels(.$col_1), levels(.$col_2)))) %>%
  xtabs(value ~ col_1 + col_2, data=., sparse = TRUE)

mat[lower.tri(mat)] <- mat[upper.tri(mat)]

Result:

4 x 4 sparse Matrix of class "dgCMatrix"
     col_2
col_1   A   B   C   D
    A .   2.1 1.3 1.4
    B 2.1 .   4.6 .  
    C 1.3 1.4 .   .  
    D 4.6 .   .   .

Answer 3

You can do it the following way. I write the code in Python since no language is specified

#I assume that your data is in a python pandas dataframe called df

df = ..load your data  
list_of_labels = [ 'A','B','C','D' ]
nb_labels = len(list_of_labels)
similarity = np.zeros( (nb_labels,nb_labels) )

for l1, l2, val in zip( df['col_1'] , df['col_2'] , df['value'] ):
    i = list_of_labels.index( l1 )
    j = list_of_labels.index( l2 )
    similarity[i][j] = val


similarity_df = pd.DataFrame(data=similarity, index=list_of_labels, columns=list_of_labels)

Create Similarity Matrix

Question

3 answers

solution1
2 ACCPTED 2018-08-29 13:52:41

solution2
2 2018-08-29 15:42:38

solution3
1 2018-08-29 13:09:44

Create Similarity Matrix

Question

3 answers

solution1 2 ACCPTED 2018-08-29 13:52:41

solution2 2 2018-08-29 15:42:38

solution3 1 2018-08-29 13:09:44

solution1
2 ACCPTED 2018-08-29 13:52:41

solution2
2 2018-08-29 15:42:38

solution3
1 2018-08-29 13:09:44