简体   繁体   中英

How to get an Adjacency matrix from count matrix

I have a nxp very sparse count matrix with only non-negative values and columns named y_1, ... , y_p. (n=2 million and p=70)

I want to convert it, using R, into a matrix that counts the amount of times that y_i and y_j have a non-zero value on the same row.

Example:

ID a b c d e 
1  1 0 1 0 0
2  0 1 1 0 0
3  0 0 1 1 0
4  1 1 0 0 0

and i want to obtain:

- a b c d e
a 2 1 1 0 0
b 1 2 1 0 0 
c 1 1 3 1 0
d 0 0 1 1 0
e 0 0 0 0 0

This is a simple matrix multiplication.

t(m) %*% m
  a b c d e
a 2 1 1 0 0
b 1 2 1 0 0
c 1 1 3 1 0
d 0 0 1 1 0
e 0 0 0 0 0

Using this data:

m = read.table(text = "ID a b c d e 
1  1 0 1 0 0
2  0 1 1 0 0
3  0 0 1 1 0
4  1 1 0 0 0", header = T)
m = as.matrix(m[, -1])

This relies on the original matrix being only 1s and 0s. If it is not, you can create it with m = original_matrix > 0


Here's it working on a matrix like you describe:

library(Matrix)
nr = 2e6
nc = 70
mm = Matrix(0, nrow = nr, ncol = nc, sparse = T)

# make, on average, three 1s per row
set.seed(47)
mm[cbind(sample(nr, size = 3 * nr, replace = T), sample(nc, size = 3 * nr, replace = T))] = 1 

system.time({res = t(mm) %*% mm})
  #  user  system elapsed 
  # 0.836   0.057   0.895 
format(object.size(res), units = "Mb")
[1] "0.1 Mb

On my laptop the calculation takes less than a second and the result is about 0.1 Mb.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM