Apologies if this is a repeat question, if the answer exists somewhere I would appreciate being pointed to it.
I have a large data frame with many factors, mix of categorical and continuous. Here is a shortened example:
x1 = sample(x = c("A", "B", "C"), size = 50, replace = TRUE)
x2 = sample(x = c(5, 10, 27), size = 50, replace = TRUE)
y = rnorm(50, mean=0)
dat = as.data.frame(cbind(y, x1, x2))
dat$x2 = as.numeric(dat$x2)
dat$y = as.numeric(dat$y)
> head(dat)
y x1 x2
1 9 C 2
2 7 C 2
3 8 B 1
4 21 A 2
5 48 A 1
6 19 A 3
I want to subset this dataset for each level of x1, so I end up with 3 new datasets for each level of factor x1. I can do this the following way:
#A
dat.A = dat[which(dat$x1== "A"),,drop=T]
dat.A$x1 = factor(dat.A$x1)
#B
dat.B = dat[which(dat$x1== "B"),,drop=T]
dat.B$x1 = factor(dat.B$x1)
#C
dat.C = dat[which(dat$x1== "C"),,drop=T]
dat.C$x1 = factor(dat.C$x1)
This is somewhat tedious as my real data have 7 levels of the factor of interest so I have to repeat the code 7 times. Once I have each new data frame in my global environment, I want to perform several functions to each one (graphing, creating tables, fitting linear models). Here is a simple example:
#same plot for each dataset
A.plot = plot(dat.A$y, dat.A$x2)
B.plot = plot(dat.B$y, dat.B$x2)
C.plot = plot(dat.C$y, dat.C$x2)
#same models for each dataset
mod.A = lm(y ~ x2, data = dat.A)
summary(mod.A)
mod.B = lm(y ~ x2, data = dat.B)
summary(mod.B)
mod.C = lm(y ~ x2, data = dat.C)
summary(mod.C)
This is a lot of copying and pasting. Is there a way I can write out one line of code for each thing I want to do and loop over each dataset? Something like below, which I know is wrong but it's what I am trying to do:
for (i in datasets) {
[i].plot = plot(dat.[i]$y, dat.[i]$x2)
mod.[i] = lm(y ~ x2, data = dat[i])
}
We can do a split
into a list
of data.frames and then loop over the list
with lapply
lst1 <- split(dat, dat$x1)
lst2 <- lapply(lst1, function(dat) {
plt <- plot(dat$y, dat$x2)
model <- lm(y ~ x2, data = dat)
list(plt, model)
})
For completeness' sake, here's how I would do this in the tidyverse
, producing two lists: one with the plots and one with the models.
library(dplyr)
library(ggplot2)
model_list <- dat %>%
group_by(x1) %>%
group_map( ~ lm(y ~ x2, data = .x))
plot_list <- dat %>%
group_by(x1) %>%
group_map( ~ ggplot(.x, aes(x2, y)) + geom_point())
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.