I'm trying to implement a generalized "switching equation" (Gerber and Green 2012, chapter 2) in r.
I have a categorical variable Z, that has k > 2 values.
I have k columns names Y_1, Y_2,... Y_k.
I want to make a variable Y that picks our the "right" values from each column. That is, if Z is 1, put the Y_1 values into Y.
I've got a solution with a loop, but it's annoying. Is there a super sweet way to do this with a one liner? No nested ifelse, pls.
N <- 100
df <- data.frame(
Z = sample(1:3, N, replace = TRUE),
Y_1 = rnorm(N),
Y_2 = rnorm(N),
Y_3 = rnorm(N)
)
# an annoying solution
df <- within(df,{
Y <- rep(NA, nrow(df))
Y[Z == 1] <- Y_1[Z == 1]
Y[Z == 2] <- Y_2[Z == 2]
Y[Z == 3] <- Y_3[Z == 3]
})
head(df)
which yields:
Z Y_1 Y_2 Y_3 Y
1 3 0.89124772 1.4377700 0.05226285 0.05226285
2 1 0.89186873 -0.6984839 -0.86141525 0.89186873
3 1 -0.01315678 1.5193461 0.18290065 -0.01315678
4 3 -0.57857274 -1.4445197 2.03764943 2.03764943
5 3 -0.19793692 -0.1818225 1.10270877 1.10270877
6 2 1.48291431 2.7264541 0.70129357 2.72645413
EDIT: I like Weihuang Wong's approach df$Y <- sapply(split(df, 1:nrow(df)), function(x) x[, paste0("Y_", x$Z)])
in part because it doesn't rely on position but rather the column names. All of the offered answers so far use column position.... I'm a tiny bit worried that sapply(split())
is slow, but maybe I'm crazy?
df$Y <- apply(df, 1, function(x) x[x[1]+1] )
head(df)
# Z Y_1 Y_2 Y_3 Y
#1 1 -0.8598997 -0.3180947 1.9374462 -0.8598997
#2 2 -0.2392902 0.2266245 0.2364991 0.2266245
#3 1 -0.8733609 -1.3892361 0.3351359 -0.8733609
#4 3 -0.6533548 -1.1042993 -0.2906852 -0.2906852
#5 1 -1.7424126 -0.2101860 0.1198945 -1.7424126
#6 2 -1.9746651 -0.4308746 -0.7849773 -0.4308746
Not exactly 1 line, but
get_result <- function(dfrow){
x <- unlist(dfrow[,1:4])
Y <- x[x[1] + 1]
}
library(purrr)
newdf <- by_row(df, get_result)
This can be done in a vectorized way with row/column
indexing
df$Y <- df[-1][cbind(1:nrow(df), df$Z)]
df
# Z Y_1 Y_2 Y_3 Y
#1 3 0.89124772 1.4377700 0.05226285 0.05226285
#2 1 0.89186873 -0.6984839 -0.86141525 0.89186873
#3 1 -0.01315678 1.5193461 0.18290065 -0.01315678
#4 3 -0.57857274 -1.4445197 2.03764943 2.03764943
#5 3 -0.19793692 -0.1818225 1.10270877 1.10270877
#6 2 1.48291431 2.7264541 0.70129357 2.72645410
df <- structure(list(Z = c(3L, 1L, 1L, 3L, 3L, 2L), Y_1 = c(0.89124772,
0.89186873, -0.01315678, -0.57857274, -0.19793692, 1.48291431
), Y_2 = c(1.43777, -0.6984839, 1.5193461, -1.4445197, -0.1818225,
2.7264541), Y_3 = c(0.05226285, -0.86141525, 0.18290065, 2.03764943,
1.10270877, 0.70129357)), .Names = c("Z", "Y_1", "Y_2", "Y_3"
), row.names = c("1", "2", "3", "4", "5", "6"), class = "data.frame")
OP here.
I timed my "annoying" by hand solution against the "ind_split" solution proposed by Weihuang Wong. I also did it by "groups":
N <- 100000
df <- data.frame(
Z = sample(1:3, N, replace = TRUE),
Y_1 = rnorm(N),
Y_2 = rnorm(N),
Y_3 = rnorm(N)
)
ind_split <-
system.time({
df$Y <- sapply(split(df, 1:nrow(df)), function(x) x[, paste0("Y_", x$Z)])
head(df)
})
revealer <-
function(list_element){
col_name <- paste0("Y_", list_element[1, "Z"])
list_element$Y <- list_element[,col_name]
return(list_element)
}
group_split <-
system.time({
split_list <- split(df, df$Z)
df <- do.call(what = rbind, lapply(split_list, revealer))
head(df)
})
by_hand <-
system.time({
# an annoying solution
df <- within(df,{
Y <- rep(NA, nrow(df))
Y[Z == 1] <- Y_1[Z == 1]
Y[Z == 2] <- Y_2[Z == 2]
Y[Z == 3] <- Y_3[Z == 3]
})
head(df)
})
ind_split
group_split
by_hand
the timings came in at
> ind_split
user system elapsed
1.023 0.083 1.136
> group_split
user system elapsed
0.011 0.002 0.013
> by_hand
user system elapsed
0.001 0.000 0.001
The annoying by hand method is way faster, which is crazy to me! Splitting by groups is faster than splitting by individuals.
此处的后期添加是通过使用match
和names
(基于先前由Akrun提出并由我自己修改的另一个解决方案)从Akrun的答案构建的,但不使用位置编号,而是:
df$Y <- df[cbind(1:nrow(df), match(paste0('Y_', df$Z), names(df)))]
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.