简体   繁体   中英

R Convert the Dataframe to Matrix based on Certain Columns

I have a data looks likes below:

ID  Group1  Group2  Group3  Group4  Group5
1   1       0       2       0       3
2   0       0       3       0       2
3   0       0       0       0       1
4   0       3       1       1       0

I want to convert the above data frame into a matrix with the ID column as the row and column headers of the matrix. If two IDs belong to one same group (no matter the value in the group column), then the cell in the matrix will be 1. If two IDs belong to two same group, then the cell in the matrix will be 2. Basically, the matrix will look like:

  1 2 3 4
1 0 2 1 1
2 2 0 1 1
3 1 1 0 0
4 1 1 0 0

Remove the ID column, convert it to a logical matrix, multiply it by its transpose using tcrossprod (which will also coerce the TRUE/FALSE values to 1/0) and zero out its diagonal by multiplying by one minus the identity matrix.

tcrossprod(DF[-1] > 0) * (1 - diag(nrow(DF)))

giving:

     [,1] [,2] [,3] [,4]
[1,]    0    2    1    1
[2,]    2    0    1    1
[3,]    1    1    0    0
[4,]    1    1    0    0

This variation also works:

diag<-`(tcrossprod(DF[-1] > 0), 0)

Note

Lines <- "ID  Group1  Group2  Group3  Group4  Group5
1   1       0       2       0       3
2   0       0       3       0       2
3   0       0       0       0       1
4   0       3       1       1       0"
DF <- read.table(text = Lines, header = TRUE)

You can write a function which calculates how many non-zero values are present at the same location between two rows. Use outer to calculate it for every combination of rows.

calc_values <- function(x, y) sum(df[x, -1] != 0 & df[y, -1] != 0)

nr <- seq(nrow(df))
mat <- outer(nr, nr, Vectorize(calc_values))
diag(mat) <- 0
dimnames(mat) <- list(df$ID, df$ID)
mat

#  1 2 3 4
#1 0 2 1 1
#2 2 0 1 1
#3 1 1 0 0
#4 1 1 0 0

data

df <- structure(list(ID = 1:4, Group1 = c(1L, 0L, 0L, 0L), Group2 = c(0L, 
0L, 0L, 3L), Group3 = c(2L, 3L, 0L, 1L), Group4 = c(0L, 0L, 0L, 
1L), Group5 = 3:0), class = "data.frame", row.names = c(NA, -4L))

Here is a solution using for loops and multiplying logical vectors of rows:

a <- structure(c(1, 0, 0, 0, 0, 0, 0, 3, 2, 3, 0, 1, 0, 0, 0, 1, 3,
2, 1, 0), .Dim = 4:5, .Dimnames = list(NULL, c("g1", "g2", "g3",
"g4", "g5")))

b <- matrix(nrow = nrow(a), ncol = nrow(a))

for (i in 1:nrow(a)) {
  for (j in 1:nrow(a)){
    if(i == j) b[i, j] <- 0
    else b[i, j] <- sum((a[i, ] > 0) * (a[j, ] > 0))
  }
}

b

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM