I have a data looks likes below:
ID Group1 Group2 Group3 Group4 Group5
1 1 0 2 0 3
2 0 0 3 0 2
3 0 0 0 0 1
4 0 3 1 1 0
I want to convert the above data frame into a matrix with the ID column as the row and column headers of the matrix. If two IDs belong to one same group (no matter the value in the group column), then the cell in the matrix will be 1. If two IDs belong to two same group, then the cell in the matrix will be 2. Basically, the matrix will look like:
1 2 3 4
1 0 2 1 1
2 2 0 1 1
3 1 1 0 0
4 1 1 0 0
Remove the ID column, convert it to a logical matrix, multiply it by its transpose using tcrossprod (which will also coerce the TRUE/FALSE values to 1/0) and zero out its diagonal by multiplying by one minus the identity matrix.
tcrossprod(DF[-1] > 0) * (1 - diag(nrow(DF)))
giving:
[,1] [,2] [,3] [,4]
[1,] 0 2 1 1
[2,] 2 0 1 1
[3,] 1 1 0 0
[4,] 1 1 0 0
This variation also works:
diag<-`(tcrossprod(DF[-1] > 0), 0)
Lines <- "ID Group1 Group2 Group3 Group4 Group5
1 1 0 2 0 3
2 0 0 3 0 2
3 0 0 0 0 1
4 0 3 1 1 0"
DF <- read.table(text = Lines, header = TRUE)
You can write a function which calculates how many non-zero values are present at the same location between two rows. Use outer
to calculate it for every combination of rows.
calc_values <- function(x, y) sum(df[x, -1] != 0 & df[y, -1] != 0)
nr <- seq(nrow(df))
mat <- outer(nr, nr, Vectorize(calc_values))
diag(mat) <- 0
dimnames(mat) <- list(df$ID, df$ID)
mat
# 1 2 3 4
#1 0 2 1 1
#2 2 0 1 1
#3 1 1 0 0
#4 1 1 0 0
data
df <- structure(list(ID = 1:4, Group1 = c(1L, 0L, 0L, 0L), Group2 = c(0L,
0L, 0L, 3L), Group3 = c(2L, 3L, 0L, 1L), Group4 = c(0L, 0L, 0L,
1L), Group5 = 3:0), class = "data.frame", row.names = c(NA, -4L))
Here is a solution using for loops and multiplying logical vectors of rows:
a <- structure(c(1, 0, 0, 0, 0, 0, 0, 3, 2, 3, 0, 1, 0, 0, 0, 1, 3,
2, 1, 0), .Dim = 4:5, .Dimnames = list(NULL, c("g1", "g2", "g3",
"g4", "g5")))
b <- matrix(nrow = nrow(a), ncol = nrow(a))
for (i in 1:nrow(a)) {
for (j in 1:nrow(a)){
if(i == j) b[i, j] <- 0
else b[i, j] <- sum((a[i, ] > 0) * (a[j, ] > 0))
}
}
b
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.