简体   繁体   中英

Splitting a list of dataframes into multiple lists based on a factor in each dataframe

I have a list of data frames with a time series of (x, y) coordinates. Each data frame also has a specific variable - trial_option - which I want to use to split my list of data frames into multiple smaller lists. Each smaller list will contain all the data frames with one trial_option factor.

df1 <- data.frame(x = runif(10, -10, 10), y = runif(10, -10, 10), trial_option = rep("A", 10))
df2 <- data.frame(x = runif(10, -10, 10), y = runif(10, -10, 10), trial_option = rep("A", 10))
df3 <- data.frame(x = runif(10, -10, 10), y = runif(10, -10, 10), trial_option = rep("B", 10))
df4 <- data.frame(x = runif(10, -10, 10), y = runif(10, -10, 10), trial_option = rep("B", 10))
df5 <- data.frame(x = runif(10, -10, 10), y = runif(10, -10, 10), trial_option = rep("C", 10))
df6 <- data.frame(x = runif(10, -10, 10), y = runif(10, -10, 10), trial_option = rep("C", 10))
mylist <- list(df1 = df1, df2 = df2, df3 = df3, df4 = df4, df5 = df5, df6 = df6)

So I want to split mylist into 3 smaller lists: mylistA, mylistB, mylistC . I thought I could use small_list <- lapply(list, subset, trial_option == A) and doing that for each trial_option but that did not return what I wanted. I also feel like repeating that for each trial_option would be tedious and not good practice. I haven't been able to find a suitable answer by googling yet.

Also, once I have these subset lists, I'll be doing some data wrangling and I then want to combine these smaller lists all back into a big list. Each subset of trial_option data frames needs to have separate data wrangling done, hence why I want to split the master list.

Any help is appreciated.

All data frames can be combined into one and then splited on trial_optin

df <- rbind(df1, df2, df3, df4, df5, df6)
split(x = df, f = df$trial_option)

Whenever you need to perform processing on data frame splits, consider by the object-oriented wrapper of tapply . While similar to split in creating named list of subset dfs by one or more factors, by allows you to process each subset df further without any lapply or for loop afterwards.

mylist <- list(df1 = df1, df2 = df2, df3 = df3, df4 = df4, df5 = df5, df6 = df6)

complete_df <- do.call(rbind, mylist)

# NAMED LIST OF DFS (NAMES ARE UNIQUE VALUES OF trial_option: A, B, C)
by_list <- by(complete_df, complete_df$trial_option, FUN=function(d) {    
    # DATA WRANGLING WHERE PARAMETER, d, IS SUBSETTED DATAFRAME
    d ...
    # RETURN A DATAFRAME AFTER PROCESSING
    return(new_d)
})

# ROW BIND ALL DF ELEMENTS (ASSUMES EACH HAVE SAME colnames() AND ncol())
new_complete_df <- do.call(rbind, by_list)   

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM