[英]R indexing a data frame using values in the column of another
I have a data frame, and two of the columns are indices for another data fame.我有一个数据框,其中两列是另一个数据名的索引。 I want to add a column to the first by indexing the second, but just calling the column names isn't working.
我想通过索引第二列向第一列添加一列,但仅调用列名是行不通的。 For example, if the first data frame is:
例如,如果第一个数据帧是:
... Gene CellLine ...
KRAS HELA ...
BRCA1 T24 ...
and my second dataframe looks like我的第二个 dataframe 看起来像
KRAS BRCA1 ...
HELA 5 3
T24 2 1
...
I want the output to look like我希望 output 看起来像
... Gene CellLine Dependency ...
KRAS HELA 5 ...
BRCA1 T24 1 ...
without having to loop through the lines because the first data frame is massive.无需遍历线路,因为第一个数据帧很大。 That is, is there any function or package that would do the equivalent to
也就是说,是否有任何 function 或 package 相当于
for (i in rownames(table1)){
table1[i, dependency] <- ifelse(table1[i,"Gene"] %in% rownames(table2) & table1[i,"CellLine"] %in% colnames(table2), table2[table1[i,"Gene"],table1[i,"CellLine"]], NA)
}
but faster?但更快?
Thanks!谢谢!
The following code is vectorized, it creates an index matrix with the two columns from df1
and uses it to extract the required values from df2
.以下代码是矢量化的,它使用
df1
的两列创建一个索引矩阵,并使用它从df2
中提取所需的值。
inx <- as.matrix(df1[c("CellLine", "Gene")])
df1$Dependency <- df2[inx]
df1
# Gene CellLine Dependency
#1 KRAS HELA 5
#2 BRCA1 T24 1
Data数据
df1 <- read.table(text = "
Gene CellLine
KRAS HELA
BRCA1 T24
", header = TRUE)
df2 <- read.table(text = "
KRAS BRCA1
HELA 5 3
T24 2 1
", header = TRUE)
You can try this approach.你可以试试这个方法。 The data used is next:
使用的数据如下:
#Data
df1 <- structure(list(Gene = c("KRAS", "BRCA1"), CellLine = c("HELA",
"T24")), class = "data.frame", row.names = c(NA, -2L))
df2 <- structure(list(id = c("HELA", "T24"), KRAS = c(5L, 2L), BRCA1 = c(3L,
1L)), class = "data.frame", row.names = c(NA, -2L))
Then the code, you can melt
and merge
data:然后代码,就可以
melt
和merge
数据了:
library(reshape)
#Melt df2
Melted <- melt(df2,id.vars = 'id')
#Now merge
Merged <- merge(df1,Melted,by.x=c('Gene','CellLine'),by.y=c('variable','id'),all.x=T)
The result would be next:结果将是下一个:
Gene CellLine value
1 BRCA1 T24 1
2 KRAS HELA 5
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.