简体   繁体   English

如何将矩阵的行名和列名与 R 中数据框中的列相匹配?

[英]How to match rownames and colnames of a matrix to a column in a dataframe in R?

I have a text file that has a column called ID with a values Eg ABC我有一个文本文件,其中有一个名为ID的列,其值例如 ABC

variant_id gene tss_distance ma_samples ma_count  ID      
1  chr1 ENSG      80       68       78    A
2  chr1 ENSG      80       395      486   B
3  chr1 ENSG      80       167      183   C   

I have a matrix that is square, so has 5000 x 5000. The rownames = colnames.我有一个方形矩阵,所以有 5000 x 5000。rownames = colnames。 (My matrix column and row names match). (我的矩阵列名和行名匹配)。 The rownames and colnames also match some IDs in the ID column in the dataframe. rownames 和 colnames 也匹配数据框中 ID 列中的一些ID。 The matrix may contain extra IDs not found in the dataframe.矩阵可能包含在数据框中找不到的额外 ID。

    [A][B][C][D]  
[A] value1 value2 value3 value4
[B] value5 value6 value7 value8 
[C] value9 value10 value11 value12
[D] value13 value14 value15 value16

I want the matrix rownames and colnames to exactly match the ID column in the dataframe.我希望矩阵行名和列名与数据框中的 ID 列完全匹配。

Matrix should be (for example): Notice how the D does not appear because D is missing in the dataframe.矩阵应该是(例如):注意 D 是如何不出现的,因为数据框中缺少 D。

   [A][B][C]  
[A] value1 value2 value3 value4
[B] value9 value10 value11 value12
[C] value13 value14 value15 value16

The problem with using the below command is it just filters the rows, I need the col and rows to be removed if not found in the dataframe, so the matrix is still a square.使用下面命令的问题是它只是过滤行,如果在数据框中找不到,我需要删除列和行,所以矩阵仍然是一个正方形。 The rownames and colnames of the final matrix should exactly match the ID column in the dataframe.最终矩阵的行名和列名应与数据框中的 ID 列完全匹配。

matrix <- matrix[row.names(matrix)%in%dataframe$ID,]

The rownames must = the colnames of the matrix and therefore be a square.行名必须 = 矩阵的列名,因此是一个正方形。 This must be true:这一定是真的:

identical(rownames(matrix),colnames(matrix))

Use match twice.使用match两次。

df1 <- data.frame(ID = LETTERS[1:3])
mat <- matrix(1:16, ncol = 4, dimnames = list(LETTERS[1:4], LETTERS[1:4]))

i_row <- match(df1$ID, rownames(mat))
i_col <- match(df1$ID, colnames(mat))
mat[i_row, i_col]
#>   A B  C
#> A 1 5  9
#> B 2 6 10
#> C 3 7 11

Created on 2022-12-20 with reprex v2.0.2创建于 2022-12-20,使用reprex v2.0.2

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM