I have a text file that has a column called ID with a values Eg ABC
variant_id gene tss_distance ma_samples ma_count ID
1 chr1 ENSG 80 68 78 A
2 chr1 ENSG 80 395 486 B
3 chr1 ENSG 80 167 183 C
I have a matrix that is square, so has 5000 x 5000. The rownames = colnames. (My matrix column and row names match). The rownames and colnames also match some IDs in the ID column in the dataframe. The matrix may contain extra IDs not found in the dataframe.
[A][B][C][D]
[A] value1 value2 value3 value4
[B] value5 value6 value7 value8
[C] value9 value10 value11 value12
[D] value13 value14 value15 value16
I want the matrix rownames and colnames to exactly match the ID column in the dataframe.
Matrix should be (for example): Notice how the D does not appear because D is missing in the dataframe.
[A][B][C]
[A] value1 value2 value3 value4
[B] value9 value10 value11 value12
[C] value13 value14 value15 value16
The problem with using the below command is it just filters the rows, I need the col and rows to be removed if not found in the dataframe, so the matrix is still a square. The rownames and colnames of the final matrix should exactly match the ID column in the dataframe.
matrix <- matrix[row.names(matrix)%in%dataframe$ID,]
The rownames must = the colnames of the matrix and therefore be a square. This must be true:
identical(rownames(matrix),colnames(matrix))
Use match
twice.
df1 <- data.frame(ID = LETTERS[1:3])
mat <- matrix(1:16, ncol = 4, dimnames = list(LETTERS[1:4], LETTERS[1:4]))
i_row <- match(df1$ID, rownames(mat))
i_col <- match(df1$ID, colnames(mat))
mat[i_row, i_col]
#> A B C
#> A 1 5 9
#> B 2 6 10
#> C 3 7 11
Created on 2022-12-20 with reprex v2.0.2
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.