[英]Selecting value based on variable name of data frame column in R
I have a data frame with a few columns containing values, and a column containing the name of the relevant column. 我有一个数据框,其中包含几个包含值的列,以及一个包含相关列名称的列。 eg 例如
df <- data.frame(p1=c("A", "B", "A"),
p2=c("C", "C", "D"),
name=c("p2", "p1", "p1"), stringsAsFactors=FALSE)
What I want to do is to retrieve a value from the column specified by the name
field, ie the output as below. 我想要做的是从name
字段指定的列中检索一个值,即输出如下。
> df
p1 p2 name value
1 A C p2 C
2 B C p1 B
3 A D p1 A
I currently get by with df$value <- ifelse(df$name=="p1", df$p1, ifelse(df$name=="p2", df$p2, NA))
, which is inelegant and unscalable if there are more than just p1
and p2
. 我目前使用df$value <- ifelse(df$name=="p1", df$p1, ifelse(df$name=="p2", df$p2, NA))
,这是不优雅和不可扩展的不仅仅是p1
和p2
。
Any suggestion on a better approach for this? 有关更好的方法的任何建议吗?
You could try 你可以试试
df$value <- df[cbind(seq_len(nrow(df)), match(df$name, names(df)))]
The above is a vectorized solution. 以上是矢量化解决方案。 Or if you need only a compact solution (based on the number of characters) 或者,如果您只需要一个紧凑的解决方案(基于字符数)
diag(as.matrix(df[,df$name]))
#[1] "C" "B" "A"
df1 <- df[rep(1:nrow(df),1e5),]
akrun <- function() {df1[cbind(seq_len(nrow(df1)),
match(df1$name, names(df1)))]}
colonel <- function() {apply(df1, 1 ,function(u) u[u['name']])}
library(microbenchmark)
microbenchmark(akrun(), colonel(), times=20L, unit='relative')
#Unit: relative
# expr min lq mean median uq max neval cld
# akrun() 1.0000 1.0000 1.00000 1.00000 1.00000 1.00000 20 a
#colonel() 118.2858 102.3968 46.25946 77.92023 59.15559 23.56562 20 b
Or very simply (but using a loop): 或者非常简单(但使用循环):
df$value = apply(df, 1 ,function(u) u[u['name']])
#> df
# p1 p2 name value
#1 A C p2 C
#2 B C p1 B
#3 A D p1 A
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.