简体   繁体   中英

Using lapply to subset a dataframe based on two or more factor variables

This is an extension of the StackOverflow question - Subset Data Based On Elements In List - which answered the problem of how to create a list of new dfs, each being constructed by subsetting the original one based on a grouping factor variable.

The challenge I am encountering is that i need to create the dfs using more than one grouping variable

To generalise the problem, I have created this toy dataset - which has as the response variable the daily amount of rain, and as classifiers the temperature range and the cloudiness of that day.

rain <- c(2, 0, 4, 25, 3, 9, 4, 0, 4, 0, 8, 35)
temp <- as.factor(c("Warm","Cold","Hot","Cold","Warm","Cold","Cold","Warm","Warm","Hot","Cold", "Cold"))
clouds <- as.factor(c("Some","Lots","None","Lots","None","None","Lots","Some","Some","Lots","None", "Some"))
df <- data.frame(rain, temp, clouds)

With the following code, i can produce three new dataframes grouped on the temp variable, all combined into a single list (df_1A):

temp_levels <- unique(as.character(df$temp))
df_1A <- lapply(temp_levels, function(x){subset(df, temp == x)})

And ditto for three new dataframes grouped by the cloudiness

cloud_levels <- unique(as.character(df$clouds))
df_1B <- lapply(cloud_levels, function(x){subset(df, clouds == x)})

However, I have not been able to come up with a simple, elegant way to produce the 9 dataframes each of which has a unique combination of temp and cloudiness

Thanks

You could use split to divide data based on unique levels of temp and clouds .

df_1 <- split(df, list(df$temp, df$clouds))

Your question implies a preference for lapply but if you don't mind using dplyr there is an elegant solution.


library(dplyr)

df_list <- 
   df %>% 
   group_by(temp, clouds) %>% 
   group_split()

# df_list

df_list[[1]]
#> # A tibble: 3 x 3
#>    rain temp  clouds
#>   <dbl> <fct> <fct> 
#> 1     0 Cold  Lots  
#> 2    25 Cold  Lots  
#> 3     4 Cold  Lots

Your data

rain <- c(2, 0, 4, 25, 3, 9, 4, 0, 4, 0, 8, 35)
temp <- as.factor(c("Warm","Cold","Hot","Cold","Warm","Cold","Cold","Warm","Warm","Hot","Cold", "Cold"))
clouds <- as.factor(c("Some","Lots","None","Lots","None","None","Lots","Some","Some","Lots","None", "Some"))
df <- data.frame(rain, temp, clouds)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM