[英]match rownames and colnames of correlation matrix
I have a vector with 3990 names (these are the column names of my dataframe) and I want to match them with the rows of my data.我有一个包含 3990 个名称的向量(这些是我的数据框的列名),我想将它们与我的数据行匹配。 My data contains correlation values and I want to subset my data based on the matches found
我的数据包含相关值,我想根据找到的匹配项对我的数据进行子集化
My data looks like this:我的数据如下所示:
I tried using grepl我尝试使用 grepl
result <- filter(df, grepl(paste(column_names, collapse="|"), rownames(df)))
but I get an error但我得到一个错误
error in 'grepl()': !
“grepl()”中的错误:! invalid regular expression
无效的正则表达式
my expected output would be我预期的 output 将是
does anyone have any suggestions on how can this be done?有没有人对如何做到这一点有任何建议?
would really be great if someone could help me with this!如果有人可以帮助我,那就太好了!
Best, Shweta最好的,Shweta
Try this:尝试这个:
library(dplyr)
dat %>%
filter(grepl(paste0("\\b", names(.), "\\b", collapse="|"), rownames(dat)))
TATA TATB TATC TATD
TATA,TATA 0.8 0.2 0.5 0.1
TATB 0.2 0.9 0.4 0.5
TATA 0.9 0.4 0.2 0.1
Data:数据:
dat <- data.frame(TATA = c(0.8,0.2,0.1,0.01,0.9),
TATB = c(0.2,0.9,0.2,0.4,0.4),
TATC = c(0.5,0.4,0.3,0.3,0.2),
TATD = c(0.1,0.5,0.15,0.5,0.1),
row.names = c("TATA,TATA", "TATB", "TATE", "TATM", "TATA"))
Three options you can try:您可以尝试三个选项:
library(reshape2)
melt(as.matrix(dat)
data.frame(rows=rownames(dat)[row(dat)], vars=colnames(dat)[col(dat)], values=c(dat))
as.data.frame(as.table(as.matrix(dat)))
result:结果:
Var1 Var2 Freq
1 TATA,TATA TATA 0.872624483
2 TATB TATA 0.533790730
3 TATE TATA 0.110495616
4 TATM TATA 0.253893718
5 TATA TATA 0.303576730
6 TATA,TATA TATB 0.774815753
7 TATB TATB 0.941361633
8 TATE TATB 0.305219935
9 TATM TATB 0.101124692
10 TATA TATB 0.968514156
11 TATA,TATA TATC 0.891697937
12 TATB TATC 0.006223573
13 TATE TATC 0.045138657
14 TATM TATC 0.848485971
15 TATA TATC 0.995542845
16 TATA,TATA TATD 0.479559761
17 TATB TATD 0.981808763
18 TATE TATD 0.227518091
19 TATM TATD 0.767491049
20 TATA TATD 0.410935185
data:数据:
dat <- data.frame(TATA = runif(5),
TATB = runif(5),
TATC = runif(5),
TATD = runif(5),
row.names = c("TATA,TATA", "TATB", "TATE", "TATM", "TATA"))
EDIT:编辑:
Using regex matching to subset your input data, as a first step:作为第一步,使用正则表达式匹配对输入数据进行子集化:
cols <- grep(pattern = paste0(rownames(dat), collapse = "|"), x = colnames(dat), value = TRUE)
rows <- grep(pattern = paste0(colnames(dat), collapse = "|"), x = rownames(dat), value = TRUE)
dat2 <- dat[rownames(dat) %in% rows, colnames(dat) %in% cols]
Yielding:产量:
as.data.frame(as.table(as.matrix(dat2)))
Var1 Var2 Freq
1 TATA,TATA TATA 0.6908219
2 TATB TATA 0.7255142
3 TATA TATA 0.1022963
4 TATA,TATA TATB 0.7291625
5 TATB TATB 0.7420069
6 TATA TATB 0.7480157
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.