Here is my data.frame called test
strain variable value L1
1 AB1 n 582.00000 1
2 AB4 n 12.00000 1
3 CB4852 n 375.00000 1
4 CB4853 n 113.00000 1
5 CB4854 n 160.00000 1
This is a melted data.frame, where L1 goes 1-30 and there are 78 variables for each L1 and 96 strains... grand total of 219,552 rows.
What I would like to do is take this data.frame (test) and create L1 (30) X variable (78) new data.frames that have the following orientation:
L1_variable (this would be name of one df)
strains1 strain2 .... strainN
row.name value value value
variable x value value value
Thus creating a new df for each L1 and variable that has the value of a given variable for each strain column.
these will then be put into a function.
I am thinking a function will need to be created and then use ddply on my df test, but i do not know how to implement this.
thanks for any and all help
It's not necessary to create seperate dataframes. You can reshape your dataframe as follows:
# creating sample data (extending your sample in order to be able to illustrate the method
df <- structure(list(strain = structure(c(1L, 2L, 3L, 4L, 5L, 1L, 2L, 3L, 4L, 5L, 1L, 2L, 3L, 4L, 5L, 1L, 2L, 3L, 4L, 5L), .Label = c("AB1", "AB4", "CB4852", "CB4853", "CB4854"), class = "factor"), variable = structure(c(2L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 1L), .Label = c("m", "n"), class = "factor"), value = c(582, 12, 375, 113, 160, 753, 92, 115, 163, 189, 462, 72, 305, 183, 360, 142, 132, 75, 308, 216), L1 = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L)), .Names = c("strain", "variable", "value", "L1"), class = "data.frame", row.names = c(NA, -20L))
# transforming the data with the reshape2 package
require(reshape2)
df2 <- dcast(df, L1 + variable ~ strain, value.var="value")
# creating a variable with unique identifiers
df2$L1var <- paste0(df2$L1, df2$variable)
Which results in the following dataframe:
df2 <- structure(list(L1 = c(1L, 1L, 2L, 2L), variable = structure(c(1L, 2L, 1L, 2L), .Label = c("m", "n"), class = "factor"), AB1 = c(753, 582, 142, 462), AB4 = c(92, 12, 132, 72), CB4852 = c(115, 375, 75, 305), CB4853 = c(163, 113, 308, 183), CB4854 = c(189, 160, 216, 360), L1var = c("1m", "1n", "2m", "2n")), .Names = c("L1", "variable", "AB1", "AB4", "CB4852", "CB4853", "CB4854", "L1var"), row.names = c(NA, -4L), class = "data.frame")
When you want seperate files for each unique identifier, you can split df2
like this:
# split dataframe in list of dataframes
dfs <- split(df2, df2$L1var)
# save each dataframe in the list to a seperate file
lapply(seq_along(dfs), function(i)write.csv(dfs[i], file = paste0(names(dfs)[i],'.csv')))
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.