I have a data frame with four columns: two columns indicate participation in a sport, while the other two columns indicate whether the player passed each of their two fitness exams.
dat <- data.frame(SOCCER = sample(0:1, 10, replace = T),
BASEBALL = sample(0:1, 10, replace = T),
TEST_1_PASS = sample(0:1, 10, replace = T),
TEST_2_PASS = sample(0:1, 10, replace = T))
I would like to obtain a list containing contingency tables for each sport and exam. I know that I can accomplish this using the following code, which uses nested lapply statements, but this strikes me as inefficient. Can anyone propose a more elegant solution that doesn't use nesting?
results <- lapply(c("SOCCER", "BASEBALL"), function(x) {
lapply(c("TEST_1_PASS", "TEST_2_PASS"), function(y){
table(sport = dat[[x]], pass = dat[[y]])
})
})
Thanks as always!
The double lapply
gets all pairwise combinations of the columns in each of the columns' vectors, like @Gregor wrote in a comment
I don't think your nesting is inefficient... you need to call table 4 times. It doesn't really matter if that's inside one loop/lapply over 4 items or 2 nested loops/lapplys with 2 items each.
But here is another way, with one of the loops in disguise as expand.grid
.
cols <- expand.grid(x = c("SOCCER", "BASEBALL"),
y = c("TEST_1_PASS", "TEST_2_PASS"),
stringsAsFactors = FALSE)
Map(function(.x, .y)table(dat[[.x]], dat[[.y]]), cols$x, cols$y)
Consider re-formatting your data into long format (ie, tidy data ), merging the two data pieces of sport and exam , then run by
(a rarely used member of apply family as object-oriented wrapper to tapply
) for all subset combinations between the two returning a named header report of results:
# RESHAPE EACH DATA SECTION (SPORT AND EXAM) INTO LONG FORMAT
df_list <- lapply(list(c("SOCCER", "BASEBALL"), c("TEST_1_PASS", "TEST_2_PASS")), function(cols)
reshape(cbind(PLAYER = row.names(dat), dat[cols]),
varying = cols, v.names = "VALUE",
times = cols, timevar = "INDICATOR",
idvar = "PLAYER", ids = NULL,
new.row.names = 1:1E4, direction = "long")
)
# CROSS JOIN (ALL COMBINATION PAIRINGS)
final_df <- Reduce(function(x,y) merge(x, y, by="PLAYER", suffixes=c("_SPORT", "_EXAM")), df_list)
final_df
# RUN TABLES FOR EACH SUBSET COMBINATION
tables_list <- with(final_df, by(final_df, list(INDICATOR_SPORT, INDICATOR_EXAM), function(sub)
table(sport = sub$VALUE_SPORT, pass = sub$VALUE_EXAM)
)
)
Output
tables_list
# : BASEBALL
# : TEST_1_PASS
# pass
# sport 0 1
# 0 3 4
# 1 2 1
# ------------------------------------------------------------
# : SOCCER
# : TEST_1_PASS
# pass
# sport 0 1
# 0 2 0
# 1 3 5
# ------------------------------------------------------------
# : BASEBALL
# : TEST_2_PASS
# pass
# sport 0 1
# 0 3 4
# 1 1 2
# ------------------------------------------------------------
# : SOCCER
# : TEST_2_PASS
# pass
# sport 0 1
# 0 2 0
# 1 2 6
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.