简体   繁体   中英

R reshape data from long to wide and vice versa

I wrote two wrapper functions for cast and melt to bring my data from long to wide form and vice versa. However, I still struggle with the function reshape_wide which brings the data from long form into wide form.

Here are my example functions plus code to run it. I created a dummy data.frame in wide format which I reshape into long format using my reshape_long function and then transform it back to the original wide form using my reshape_wide function. However, the reshaping fails for a reason I cannot figure it. It seems the formula used in dcast is wrong.

reshape_long <- function(data, identifiers) {
    data_long <- melt(data, id.vars = identifiers, 
                            variable.name="name", value.name="value")
    data_long$value <- as.numeric(data_long$value)
    data_long <- data_long[!is.na(data_long$value), ]
    return(data_long)
}

reshape_wide <- function(data, identifiers, name) {
    if(is.null(identifiers)) {
        formula_wide <- as.formula(paste(paste(identifiers,collapse="+"), 
                                   "series ~ ", name))      
    } else {
        formula_wide <- as.formula(paste(paste(identifiers,collapse="+"), 
                                   "+ series ~ ", name))
    }
    series <- ave(1:nrow(data), data$name, FUN=function(x) { seq.int(along=x) }) 
    data <- cbind(data, series) 
    data_wide <- dcast(data, formula_wide, value.var="value")
    data_wide <- data_wide[,!(names(data_wide) %in% "series")]
    return(data_wide)
}


data <- data.frame(ID = rep("K", 6), Type = c(rep("A", 3), rep("B", 3)),
                   X = c(NA,NA,1,2,3,4), Y = 5:10, Z = c(NA,11,12,NA,14,NA))
data <- reshape_long(data, identifiers = c("ID", "Type"))
data
reshape_wide(data, identifiers = c("ID", "Type"), name="name")

Here is a link to my R output when I run the code above:

http://pastebin.com/ej8F9GnL

What is wrong is that in column Type B appears 5 times rather than 3 times as it should be. Do you get the same data.frame?

Here is the R output from sessionInfo()

> sessionInfo()
R version 2.14.0 (2011-10-31)
Platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit)

locale:
[1] C

attached base packages:
[1] grid      stats     graphics  grDevices utils     datasets  methods  
[8] base     

other attached packages:
 [1] reshape2_1.2.1       outliers_0.14        lme4_0.999375-42    
 [4] Matrix_1.0-1         gregmisc_2.1.2       gplots_2.10.1       
 [7] KernSmooth_2.23-7    caTools_1.12         bitops_1.0-4.1      
[10] gtools_2.6.2         gmodels_2.15.1       gdata_2.8.2         
[13] lattice_0.20-0       dataframes2xls_0.4.5 RankProd_2.26.0     
[16] R.utils_1.9.3        R.oo_1.8.3           R.methodsS3_1.2.1   
[19] xlsx_0.3.0           xlsxjars_0.3.0       rJava_0.9-2         
[22] rj_1.0.0-3          

loaded via a namespace (and not attached):
[1] MASS_7.3-16   nlme_3.1-102  plyr_1.6      rj.gd_1.0.0-1 stats4_2.14.0
[6] stringr_0.5   tools_2.14.0 

The problem might be here:

series <- ave(1:nrow(data), data$name, FUN=function(x) { seq.int(along=x) }) 

Should get out out of the habit of using "$" in functions, since it doesn't interpret passed values. Use "[[" and do not quote the argument:

series <- ave(1:nrow(data), data[[name]], FUN=function(x) { seq.int(along=x) }) 

In this example it would not make a difference because name == "name", but if you tried to use it with any other value for name it would fail.

The example cannot work: since the ID and Type do not form a primary key (ie, since there are several rows with the same id and type), when the data is put in tall format, you no longer know if two values come from the same row.

Also, I am not sure what you are trying to do with your series column, but it does not seem to work.

library(reshape2)
d <- data.frame(
  ID = rep("K", 6), 
  Type = c(rep("A", 3), rep("B", 3)),
  X = c(NA,NA,1,2,3,4), 
  Y = 5:10, 
  Z = c(NA,11,12,NA,14,NA)
)
d$row <- seq_len(nrow(d))  # (row,ID,Type) is now a primary key
d
d1 <- reshape_long(d, identifiers = c("row", "ID", "Type"))
d1
dcast(d1, row + ID + Type ~ name) # Probably what you want
reshape_wide(d1, identifiers = c("row", "ID", "Type"), name="name")

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM