简体   繁体   中英

Transpose by column in R using plyr

Here is my data.frame called test

    strain  variable    value       L1
1   AB1            n    582.00000   1
2   AB4            n    12.00000    1
3   CB4852         n    375.00000   1
4   CB4853         n    113.00000   1
5   CB4854         n    160.00000   1

This is a melted data.frame, where L1 goes 1-30 and there are 78 variables for each L1 and 96 strains... grand total of 219,552 rows.

What I would like to do is take this data.frame (test) and create L1 (30) X variable (78) new data.frames that have the following orientation:

L1_variable (this would be name of one df)

               strains1  strain2 .... strainN
    row.name     value     value        value
    variable x   value     value        value

Thus creating a new df for each L1 and variable that has the value of a given variable for each strain column.

these will then be put into a function.

I am thinking a function will need to be created and then use ddply on my df test, but i do not know how to implement this.

thanks for any and all help

It's not necessary to create seperate dataframes. You can reshape your dataframe as follows:

# creating sample data (extending your sample in order to be able to illustrate the method
df <- structure(list(strain = structure(c(1L, 2L, 3L, 4L, 5L, 1L, 2L, 3L, 4L, 5L, 1L, 2L, 3L, 4L, 5L, 1L, 2L, 3L, 4L, 5L), .Label = c("AB1", "AB4", "CB4852", "CB4853", "CB4854"), class = "factor"), variable = structure(c(2L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 1L), .Label = c("m", "n"), class = "factor"), value = c(582, 12, 375, 113, 160, 753, 92, 115, 163, 189, 462, 72, 305, 183, 360, 142, 132, 75, 308, 216), L1 = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L)), .Names = c("strain", "variable", "value", "L1"), class = "data.frame", row.names = c(NA, -20L))

# transforming the data with the reshape2 package
require(reshape2)
df2 <- dcast(df, L1 + variable ~ strain, value.var="value")

# creating a variable with unique identifiers
df2$L1var <- paste0(df2$L1, df2$variable)

Which results in the following dataframe:

df2 <- structure(list(L1 = c(1L, 1L, 2L, 2L), variable = structure(c(1L, 2L, 1L, 2L), .Label = c("m", "n"), class = "factor"), AB1 = c(753, 582, 142, 462), AB4 = c(92, 12, 132, 72), CB4852 = c(115, 375, 75, 305), CB4853 = c(163, 113, 308, 183), CB4854 = c(189, 160, 216, 360), L1var = c("1m", "1n", "2m", "2n")), .Names = c("L1", "variable", "AB1", "AB4", "CB4852", "CB4853", "CB4854", "L1var"), row.names = c(NA, -4L), class = "data.frame")

When you want seperate files for each unique identifier, you can split df2 like this:

# split dataframe in list of dataframes
dfs <- split(df2, df2$L1var)

# save each dataframe in the list to a seperate file
lapply(seq_along(dfs), function(i)write.csv(dfs[i], file = paste0(names(dfs)[i],'.csv')))

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM