简体   繁体   中英

Create Similarity Matrix

I have a matrix that looks like the following:

col_1 col_2 value

 A      B     2.1

 A      C     1.3

 B      C     4.6

 A      D     1.4

....

I would like to get a similarity matrix:

    A    B    C   D

A    X   2.1  1.3  1.4

B    2.1  X   4.6   ...

C    ...  ...  X    ...

D    ...  ... ...    X

So the row and the column names are A,B,C,D and the it takes the value from the third column and adds it to the matrix The problem is also that the original matrix has a length of approximately 10 000 rows.

As Roland suggests, you could use dcast() :

library(data.table)
dcast(df, col_1 ~ col_2)
##   col_1   B   C   D
## 1     A 2.1 1.3 1.4
## 2     B  NA 4.6  NA

Where:

df <- data.frame(
  col_1 = c("A", "A", "B", "A"), 
  col_2 = c("B","C", "C", "D"), 
  value = c(2.1, 1.3, 4.6, 1.4)
)

With xtabs and mutate_at . sparse = TRUE turns the output to a sparseMatrix:

library(dplyr)

mat <- df %>%
  mutate_at(1:2, factor, levels = unique(c(levels(.$col_1), levels(.$col_2)))) %>%
  xtabs(value ~ col_1 + col_2, data=., sparse = TRUE)

mat[lower.tri(mat)] <- mat[upper.tri(mat)]

Result:

4 x 4 sparse Matrix of class "dgCMatrix"
     col_2
col_1   A   B   C   D
    A .   2.1 1.3 1.4
    B 2.1 .   4.6 .  
    C 1.3 1.4 .   .  
    D 4.6 .   .   .  

You can do it the following way. I write the code in Python since no language is specified

#I assume that your data is in a python pandas dataframe called df

df = ..load your data  
list_of_labels = [ 'A','B','C','D' ]
nb_labels = len(list_of_labels)
similarity = np.zeros( (nb_labels,nb_labels) )

for l1, l2, val in zip( df['col_1'] , df['col_2'] , df['value'] ):
    i = list_of_labels.index( l1 )
    j = list_of_labels.index( l2 )
    similarity[i][j] = val


similarity_df = pd.DataFrame(data=similarity, index=list_of_labels, columns=list_of_labels)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM