R Convert the Dataframe to Matrix based on Certain Columns

Question

I have a data looks likes below:

ID  Group1  Group2  Group3  Group4  Group5
1   1       0       2       0       3
2   0       0       3       0       2
3   0       0       0       0       1
4   0       3       1       1       0

I want to convert the above data frame into a matrix with the ID column as the row and column headers of the matrix. If two IDs belong to one same group (no matter the value in the group column), then the cell in the matrix will be 1. If two IDs belong to two same group, then the cell in the matrix will be 2. Basically, the matrix will look like:

Answer 1

Remove the ID column, convert it to a logical matrix, multiply it by its transpose using tcrossprod (which will also coerce the TRUE/FALSE values to 1/0) and zero out its diagonal by multiplying by one minus the identity matrix.

tcrossprod(DF[-1] > 0) * (1 - diag(nrow(DF)))

giving:

     [,1] [,2] [,3] [,4]
[1,]    0    2    1    1
[2,]    2    0    1    1
[3,]    1    1    0    0
[4,]    1    1    0    0

This variation also works:

diag<-`(tcrossprod(DF[-1] > 0), 0)

Note

Lines <- "ID  Group1  Group2  Group3  Group4  Group5
1   1       0       2       0       3
2   0       0       3       0       2
3   0       0       0       0       1
4   0       3       1       1       0"
DF <- read.table(text = Lines, header = TRUE)

Answer 2

You can write a function which calculates how many non-zero values are present at the same location between two rows. Use outer to calculate it for every combination of rows.

calc_values <- function(x, y) sum(df[x, -1] != 0 & df[y, -1] != 0)

nr <- seq(nrow(df))
mat <- outer(nr, nr, Vectorize(calc_values))
diag(mat) <- 0
dimnames(mat) <- list(df$ID, df$ID)
mat

#  1 2 3 4
#1 0 2 1 1
#2 2 0 1 1
#3 1 1 0 0
#4 1 1 0 0

data

df <- structure(list(ID = 1:4, Group1 = c(1L, 0L, 0L, 0L), Group2 = c(0L, 
0L, 0L, 3L), Group3 = c(2L, 3L, 0L, 1L), Group4 = c(0L, 0L, 0L, 
1L), Group5 = 3:0), class = "data.frame", row.names = c(NA, -4L))

Answer 3

Here is a solution using for loops and multiplying logical vectors of rows:

a <- structure(c(1, 0, 0, 0, 0, 0, 0, 3, 2, 3, 0, 1, 0, 0, 0, 1, 3,
2, 1, 0), .Dim = 4:5, .Dimnames = list(NULL, c("g1", "g2", "g3",
"g4", "g5")))

b <- matrix(nrow = nrow(a), ncol = nrow(a))

for (i in 1:nrow(a)) {
  for (j in 1:nrow(a)){
    if(i == j) b[i, j] <- 0
    else b[i, j] <- sum((a[i, ] > 0) * (a[j, ] > 0))
  }
}

b

R Convert the Dataframe to Matrix based on Certain Columns

Question

3 answers

solution1
4 ACCPTED 2021-05-18 03:34:55

Note

solution2
1 2021-05-18 03:32:53

solution3
1 2021-05-18 04:34:00

R Convert the Dataframe to Matrix based on Certain Columns

Question

3 answers

solution1 4 ACCPTED 2021-05-18 03:34:55

Note

solution2 1 2021-05-18 03:32:53

solution3 1 2021-05-18 04:34:00

solution1
4 ACCPTED 2021-05-18 03:34:55

solution2
1 2021-05-18 03:32:53

solution3
1 2021-05-18 04:34:00